Challenge Team Interim Report

New Mexico Supercomputing Challenge

Registration

Proposals
Interims
Final Reports

Dates

Kickoff
Evaluations
Expo
STI
Wiki

School Map
Sponsors

Mail

Challenge
Technical Guide

Past Participant Survey

GUTS

Team Number: 003

School Name: Sandia Preparatory School

Area of Science: Biochemistry

Project Title: Pattern Analysis of Results from Flow Cytometry Data

Abstract

Interim

Final Report

Problem Definition:
In today's world, new drugs are discovered by processes that use large amounts of materials and data. A flow cytometer, which measures cell responses to potential new drug compounds, is a very important device in the field of drug discovery.

Thus stated, our goal is to accomplish two tasks: first, to identify the data from individual test samples in the stream of data output of a flow cytometer, and second, to analyze this refined data. These processes are currently done by hand, which takes much time and effort on the part of researchers. With the help of a computer program, these tasks could be completed quickly and with minimal effort; this will increase efficiency. Efficiency is especially important when working in the field of drug discovery.

The Flow Cytometer:
In order to fully comprehend this project, one must first understand the flow cytometer. A flow cytometer is a machine that measures flowing cells to detect cell reactions and responses.

Before being sent through the flow cytometer, the cell samples must be prepared. A potential drug is added to the cells, and the samples are tagged with a fluorescent dye. Different types light emitting 'fluorescent tags' can be used to monitor cell responses. Different cells will receive different 'tags', thus making them different colors or intensities. Next, they are carried through the flow cytometer in a saline solution and passed at regular intervals past the laser beam. The cells deflect the laser light. If the fluorescence is altered in the presence of the test compound, this means that a reaction between the cells and the potential drug has occurred. When a reaction occurs, this is called a "hit". Hits are potential drug targets.

The colors of light deflected or emitted by the cell are optically separated and go into a detector. These detectors are called photomultiplier tubes, or PMTs. A PMT has a sensor in it, which sends a signal to a computer whenever a cell goes into it. The computer is then able to tell what fluorescence a cell has, and at which time interval the light went into the PMT. This shows up in the researcher's computer screen as a two-dimensional graph of dots showing the time when the cell came through the machine on the x-axis, and the fluorescence on the y-axis.

By looking at the fluorescence of the different particles on this graph, researchers are able to tell what kinds of reactions have taken place because every type of cell receives a characteristic fluorescence. If the potential drug causes this fluorescence to change in a sample, then the sample is a hit. Because hits may occur at rates of one sample per thousand, large numbers of samples are needed and large amounts of data are produced.

Problem solution:
To accomplish our goals, we will be writing code in C++. The first thing that is to write a program that will open a file containing flow cytometry data and store the data in an array. Because there is so much data in a file of this type, we have to access more memory; this is done using a program. In the array, every value will be initialized to zero, and each point in the sample data will be saved in the array as one. We plan to make 'holes' in the array where there is no data, or zeros. This way, the lack of data will not take up memory space. When this is accomplished, we will make a looping program, which will go through the data file and check for the number of cells that passed through the flow cytometer at a time interval. Whenever it gets to a time interval where there are more than a defined number of cells, it will identify the presence of a sample. This number will be variable once our code works. The sample ends when the number of events drops below the threshold value. This is because there are places where no samples were going through the machine; therefore, any information gained in these time intervals is 'trash'. The data will be analyzed by numbering each sample and determining whether there was a response. The sample number will be traced back to the test compound, now identified as a hit.

Project to date:
We have been successful in accessing the data file. We have much of our code written for the array to store the data file. However, we are not quite ready to go on to the final step in our programming, which is making the loop.

We plan to begin the last programming by using small amounts of data. This should make it easier to handle than using an entire data file. When we are comfortable with this, we will expand our program to accommodate larger amounts of data. Project Advisors: Mr. Allen Arsenault Dr. Larry Sklar

Team Members

Team Mail

Sponsoring Teachers

Project Advisor(s)

Mr. Allen Arsenault
Dr. Larry Sklar

For questions about the Supercomputing Challenge, a 501(c)3 organization, contact us at: consult1516 @ supercomputingchallenge.org New Mexico Supercomputing Challenge, Inc. 80 Cascabel Street Los Alamos, New Mexico 87544 (505) 667-2864

Tweet #SupercomputingChallenge