Team Number: 054

School Name: Sandia Preparatory School

Area of Science: Computer Security

Project Title: 

Definition of Problem:

    In 2002 the number of known computer viruses surpassed 70,000 (www.securitystats.com). “It is difficult to evaluate anti-virus systems, since they work in the wild and few companies would be willing to turn off their anti-virus software to be part of a control group” (An epidemiological model of virus spread and clean up from HP). One of the most important parts of the detection and analysis of viruses is the way that they spread from computer to computer. If an accurate model of the spread of viruses can be created that would keep track of the amount of the time it takes for an anti-virus company, like Norton, to create a definition of the virus and distribute it to all of its uses, it will be easier to create a temporary stopgap measure that may help keep viruses from infiltrating a network, or perhaps find a way of identifying new kinds of viruses based on the pattern of its distribution.

    “…The word virus is used to denote any malicious code. These codes fall into three main categories, namely virus (in technical sense), Trojan horses, and worms” (An Epidemiological Model of Computer Virus Spread and Countermeasures).

Plan of How to Utilize Computers to Solve the Problem:

    Using C++ it is possible to create a program, using an already created program by HP as a base, that will create an on the fly model of how the virus circulates and allow us to manipulate the time needed to create an anti-virus definition and distribute it. The model is based on the premise of how the average virus is spread:

    Based on the order in An epidemiological model of virus spread and clean up from HP:

  1. The virus is released into the wild by its creator
  2. The virus spreads freely, infecting machines and delivering its payload
  3. The virus is eventually noticed and the company is alerted
  4. The company then works to isolate the virus and generate a ‘signature’ that can be used in scanning software to detect the presence of the virus
  5. Then the company distributed the signature to its clients through a central server regularly polled by the client machines looking for updates

    Our goal is to use a computer or cluster node to create a model or graph of how the virus spreads initially and then monitor how long it takes to purge all, or most of, the computers of the virus. Note: no ‘live’ viruses are used in this model. “The machines (in the model) move between four states:

  1. Susceptible machines which are vulnerable to infection. (S)
  2. Infected machines that have contracted the virus and are actively spreading it. (I)
  3. Machines in which the virus has been detected and is prevented from spreading the virus. The state includes computers that have been disconnected from its network when a virus is detected to prevent further spreading of the virus as well as computers that have been incapacitated by the virus and thus, are inactive and no longer spreading the virus. (D)
  4. The removed state consists of machines that are immune to the virus. The state includes machines that are not susceptible to the virus to begin with as well as machines in which the virus has been removed and is made immune to the virus by the anti-virus software.

        (An epidemiological Model of Computer Virus Spread and Countermeasures).

Progress:

    To date, a basic program based on earlier models has been written in C++ to model the spread and detection of a virus using a system of variable parameters. The computers are given possible statuses of susceptible, infected, and detected. The probability of any given computer entering one of these classes is directly relative to the category before in the form of a very basic Markov chain, which, at this point, is little more than moving from susceptible to infected to detected over a given period of time. 

    Our team continues to work with Eric DeBenedictis of the Red Storm Project at Sandia Laboratories, who is our primary human resource on viruses, and Edward Bedrick of the University of New Mexico Statistics department, our resource on the statistics involved in our project. We are at present devising a new direction to take the project in which would add an element of supercomputing to the basic program already written.

Expected Results:   

    In a final prototype, we hope to fashion a simulation of the possible spread of multiple viruses through a system of computers using differently valued parameters for each and subjecting new ones such as unsusceptible and infected but dormant. A likely path for our enhanced simulation is currently in the realm of e-mail viruses, which may be given the additional stricture of whether the viral e-mail is opened or not. The final make-up of these additions however remains to be seen.