Identification of Objects in Real Images

Identification of Objects in Real Images

Category A

New Mexico High School
Supercomputing Challenge
Final Report
April 5, 2000

Team 028
Alamogordo High School

Team Member:
Jeremy Pepper
Teacher:
Albert Simon
Project Mentor:
Barak A. Pearlmutter

Table of Contents

Introduction
The Project
The Program
Results
Conclusions
Appendix
Bibliography

Introduction

Computers have always been useful for certain tasks, mainly those involving mathematical computation. However, we usually consider them useless for more "human" tasks. One such task has been image identification: how can we get a computer to determine what object is in a photograph? This project has received much attention, but no efficient systems are in common use.

Neural networks are a development in computational method that allows computers to learn new tasks when presented with examples. For example, one could teach the XOR function to a neural network by giving it the following:

Inputs		Output
0	0	0
1	0	1
0	1	1
1	1	0

Then, the computer would be expected to come up with a method of obtaining these results. Of course, this is not a good application of neural networks.

Neural networks would be extremely useful in identifying objects, as it only must be presented with various examples of each object. It would then seek to find something unique about each object, thus allowing it to identify the objects in other photographs.

The Project

The project was to create a program capable of identifying a specific object within a photograph. The scope was further limited to a 100x100-grayscale image and a small number of examples. This would allow the project to be completed. However, it is still being extended to include more examples and larger images.

The Program

The program used is a small neural network simulator I wrote specifically for the project. The neural network is stored as a 3x10000 matrix of type node. Node keeps track of the state of the current node, its activation threshold, and the weights of the connections between it and other nodes. It contains four functions that I've currently implemented: RunNetwork(), RunNetworkWithFile(), TrainNetwork(), and GetError(). The actual code is reproduced in the appendix.

RunNetwork() simply forward-propagates an image from the example set through the neural network. It receives one argument, the image number it’s using from the examples. It returns nothing, but it does store its results as the states of the nodes of the output layer of the neural network.

RunNetworkWithFile() is similar, except it loads a RAW-format image file from the file handle passed to it.

TrainNetwork() is the function that does most of the work. It calls GetError() to receive information on the status of the network. It then updates the connection weights and activation thresholds appropriately. This function is passed the number of times to loop, as training takes many cycles.

GetError() makes a call to RunNetwork() and then determines the error value for each node.

Results

The learning curve was extremely slow, possibly indicating an undersized neural network, as one containing too few nodes is incapable of learning all of the data. After about 3000 cycles, it was improving. It could then begin to pick out the common features. However, this caused some error as it hadn’t been trained long enough yet. It identified half of the six photos of floppy disks correctly, and those that it missed it thought were calculators. This means it was just generalizing too well, which could be partially due to the low-resolution images. It did none of the ball images correctly and misidentified one of them as a CD. The other it identified as a floppy disk. The first one did have nearly rectangular regions in it, and the second ball was of course round like a CD. Therefore, it again just wasn’t paying attention to detail, which can be fixed with more training. It identified one out of the two CDs correctly. The other it identified as a disk. However, the output wasn’t strong in that direction (a 0.229928 out of a possible 1.0), which just further shows the network was not mature enough yet. The first image of a computer was strongly misinterpreted as a calculator, but the two images are similar to an untrained network: they both have keys or buttons, and they both are basically rectangular. The second computer image it called a disk. Well, the similarity is there! See for yourself:

Compare to one of the disk images used in training:

Both have a lightly colored rectangular region at the top, followed by darker stuff, and, if you look closely at the computer image, you can see a rectangular light-patch near its disk drive. It just needed to pay more attention to detail. A longer training cycle would fix this, as it has in my prior experiences with neural network simulations. The first calculator was identified as a disk, as was the second.

After 500 more training cycles after setting EPSILON from 0.5 to 0.99, the program learned some of the other images. However, it was becoming less accurate on those about which the computer was before "unsure," having the highest output level at nowhere near 1.0.

Conclusions

The computer did begin to recognize some of the objects, and it showed evidence that it was close to recognizing others. Although this is a small network with limited capabilities, I believe it conclusively shows that neural networks can successfully identify objects in photographs, as the network is still learning new patterns. However, it did show signs that indicate the neural network was too small.

Appendix


#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#define BUILD_NUMBER 2
#define IMG_SIZE_X 100
#define IMG_SIZE_Y 100
#define NUM_IMAGES 14
#define NUM_INPUTS 10000
#define NUM_HIDDEN 224
#define NUM_OUTPUT 5
#define EPSILON 0.5

typedef struct
{
    float state;
    float weights[224];
    float thresh;
}node;

void DemoMode(char *);
void RunNetwork(int); /* Receives pattern number */
void RunNetworkWithFile(FILE *); /* Receives file handle */
void TrainNetwork(int); /* Receives number of cycles*/
void GetError(int); /* Receives pattern number */

node NeuralNet[3][10000];
float error[2][10000]; /* Error for each layer */
float error2[2][10000]; /* Made global due to errors trying to make it local */
float werror[2][224][10000];
int numNodes[3] = {10000, NUM_HIDDEN, NUM_OUTPUT};
int numWeights[3] = {NUM_HIDDEN, NUM_OUTPUT, 0};
float imgs[NUM_IMAGES][IMG_SIZE_X][IMG_SIZE_Y];
float datout[NUM_IMAGES][NUM_OUTPUT];

/* First layer:   10000 nodes *
 * Second layer:    224 nodes *
 * Third layer:       5 nodes */

int main(int argc, char *argv[])
{
	char choice = ' ';
	char datFile[256], netFile[256], imgFile[256];	
	int layer, n, weight, i, cycles, hi;
    float avgerr = 0.0, err, hiout;
	FILE *dat, *net, *img;
	printf("NMHSSCC Team 028's project\n");
	printf("Identification of Objects in Real Images\n\n");
	printf("Neural network training program\n");
	printf("Build number %i\n\n", (int) BUILD_NUMBER);
	printf("Menu:\n");
	printf("  1. Demo mode (for presentations)\n");
	printf("  2. Default mode (use training.dat to train neural.net)\n");
	printf("  3. Custom mode (use files specified by user)\n");
    while (choice > '3' || choice < '1')
	{
		printf("Please enter your choice: ");
		scanf("%c", &choice);
	}
	switch(choice)
	{
		case '1':
			DemoMode(argv[0]);
			break;
		case '2':
			printf("Using default files in training.\n");
			strcpy(datFile, "training.dat");
			strcpy(netFile, "neural.net");
			break;
		case '3':
			printf("Training data file: ");
			scanf("%s", datFile);
			printf("Neural network file: ");
			scanf("%s", netFile);
	}
	if(!(dat = fopen(datFile, "rb")))
	{
		printf("%s not found!\n", datFile);
		return -1;
	}
	fread(imgs, sizeof(imgs), 1, dat);
	fread(datout, sizeof(datout), 1, dat);
	fclose(dat);
	if(!(net = fopen(netFile, "rb")))
	{
		printf("Network %s not found, creating...", netFile);
		if(!(net = fopen(netFile, "wb")))
		{
			printf("error!\nCannot create file!\n");
			return -1;
		}
		fwrite(NeuralNet, sizeof(NeuralNet), 1, net);
		fclose(net);
		printf("done.\n");
		printf("Initializing neural network...");
		srand(1);
        for(layer = 0; layer < 3; layer++)
		{
            for(n = 0; n < numNodes[layer]; n++)
			{
				NeuralNet[layer][n].thresh = (float) rand() / 32767;
                for(weight = 0; weight < numNodes[layer]; weight++)
				{
					NeuralNet[layer][n].weights[weight] = (float) rand() / 32767;
				}
			}
		}
		printf("done.\n");
	}
	else
	{
		printf("Reading network from %s...", netFile);
		fread(NeuralNet, sizeof(NeuralNet), 1, net);
		fclose(net);
		printf("done.\n");
	}
	/* Calculate average error */
    for(n = 0; n < NUM_OUTPUT; n++)
	{
		GetError(n);
        for(i = 0; i < NUM_OUTPUT; i++)
		{
            avgerr += abs(error[1][i]);
		}
	}
	printf("Average error on output nodes is %f.\n", avgerr);
    if(avgerr > .1)
		printf("This network needs to be trained!\n");
	for(;;)
	{
		printf("Menu:\n");
		printf("  1. Train network\n");
		printf("  2. Run image from file through network\n");
		printf("  3. Save network to file\n");
		printf("  4. Exit\n");
		choice = ' ';
        while (choice > '4' || choice < '1')
		{
			printf("Your choice: ");
			scanf("%c", &choice);
		}
		switch(choice)
		{
			case '1':
			printf("Train for how many cycles? ");
			scanf("%i", &cycles);
			TrainNetwork(cycles);
			avgerr = 0.0;
            for(n = 0; n < NUM_OUTPUT; n++)
			{
				GetError(n);
                for(i = 0; i < NUM_OUTPUT; i++)
				{
                    err = error[1][i];
                    if(err < 0)
                        err = -err;
                    avgerr += err;
				}
			}
			printf("Average error is now %f.\n", avgerr);
			break;
			case '2':
            if(avgerr > .1)
				printf("If you really insist...\n");
			printf("Image file name: ");
			scanf("%s", imgFile);
			if(!(img = fopen(imgFile, "rb")))
			{
				printf("Image file not found!\n");
			}
			else
			{
				printf("Running...");
				RunNetworkWithFile(img);
				fclose(img);
				hiout = 0;
                i = 0;
                for(i = 0; i < 5; i++)
				{
                    if(NeuralNet[2][i].state > hiout)
					{
						hiout = NeuralNet[2][i].state;
						hi = i + 1;
					}
				}
				printf("done.\nHigh output was %i with value %f.\n", hi, hiout);
			}
			break;
			case '3':
			printf("Saving neural network...");
			net = fopen(netFile, "wb");
			fwrite(NeuralNet, sizeof(NeuralNet), 1, net);
			fclose(net);
			printf("done.\n");
			break;
			case '4':
			printf("Ok, all done now!\n");
			return 1;
		}
	}
}

void RunNetwork(int img)
{
	/* Function forward-propagates image img through the network */
	int x, y, l, n, nl;
	float val;
	/* First, load the pattern into the first layer */
    for(y = 0; y < IMG_SIZE_Y; y++)
	{
        for(x = 0; x < IMG_SIZE_X; x++)
		{
			NeuralNet[0][y * IMG_SIZE_X + x].state = imgs[img][x][y];
		}
	}
	/* Feed the values through the network */
    for(l = 1; l < 3; l++)
	{
        for(n = 0; n < numNodes[l]; n++)
		{
			val = NeuralNet[l][n].thresh;
            for(nl = 0; nl < numNodes[l-1]; nl++)
			{
				val = val + NeuralNet[l-1][nl].state * NeuralNet[l-1][nl].weights[n];
			}
			NeuralNet[l][n].state = 1.0 / (1.0 + exp(0.0 - val));
		}
	}
}

void RunNetworkWithFile(FILE *img)
{
	/* Function forward-propagates image from file through the network */
	int x, y, l, n, nl;
	float val;
    char c;
	/* First, load the pattern into the first layer */
    for(y = 0; y < IMG_SIZE_Y; y++)
	{
        for(x = 0; x < IMG_SIZE_X; x++)
		{
            fread(&c, sizeof(char), 1, img);
            NeuralNet[0][x * IMG_SIZE_Y + y].state = (float) c / 255;
		}
	}
	/* Feed the values through the network */
    for(l = 1; l < 3; l++)
	{
        for(n = 0; n < numNodes[l]; n++)
		{
			val = NeuralNet[l][n].thresh;
            for(nl = 0; nl < numNodes[l-1]; nl++)
			{
				val = val + NeuralNet[l-1][nl].state * NeuralNet[l-1][nl].weights[n];
			}
			NeuralNet[l][n].state = 1.0 / (1.0 + exp(0.0 - val));
		}
	}
}

void GetError(int p)
{
	int n, n2;
	RunNetwork(p);
    for(n = 0; n < NUM_OUTPUT; n++)
	{
		error2[1][n] = datout[p][n] - NeuralNet[2][n].state;
        error[1][n] = error2[1][n] * NeuralNet[2][n].state * (1.0 - NeuralNet[3][n].state);
	}
    for(n = 0; n < NUM_HIDDEN; n++)
	{
		error2[0][n] = 0;
        for(n2 = 0; n2 < NUM_OUTPUT; n2++)
		{
			error2[0][n] += error[1][n2] * NeuralNet[2][n].weights[n2];
		}
		error[0][n] = error2[0][n] * NeuralNet[1][n].state * (1.0 - NeuralNet[1][n].state);
	}
}

void TrainNetwork(int cycles)
{
	int i, p, l, n, in;
    for(i = 0; i < cycles; i++)
	{
        for(p = 0; p < NUM_OUTPUT; p++)
		{
			GetError(p);
            for(l = 1; l < 3; l++)
			{
                for(n = 0; n < numNodes[l]; n++)
				{
                    for(in = 0; in < numNodes[l-1]; in++)
					{
						werror[l][n][in] = error[l][n] * NeuralNet[l-1][in].state;
					}
				}
			}
            for(l = 1; l < 3; l++)
			{
                for(n = 0; n < numNodes[l]; n++)
				{
					NeuralNet[l][n].thresh += EPSILON * error[l][n];
                    for(in = 0; in < numNodes[l-1]; in++)
					{
						NeuralNet[l-1][in].weights[n] += EPSILON * werror[l][n][in];
					}
				}
			}
		}
	}
}

void DemoMode(char *progName)
{
	printf("%s is running in demo mode!\n", progName);
	printf("Well, at least it will someday!\n");
}

Bibliography

Neural Networks [Online]. Available HTTP: nastol.astro.lu.se Directory: /~henrik File: neuralnet1.html

Kutza, Karsten. Neural Networks at your Fingertips [Online]. Available HTTP: www.geocities.com Directory: /CapeCanaveral/1624 File: index.html

Generation 5: Artificial Intelligence Repository - Introduction to Neural Networks [Online]. Available HTTP: library.thinkquest.org Directory: /18242 File: nnintro.shtml