Diabetes Modeling

With permission, turning Shiprock High Team 61, AiS Challenge project, Diabetes Calamity Type II research paper into a computational science project.

Visit and study their research (think more resources need to be added) http://mode.lanl.k12.nm.us/~chtvanob/SuperComputing/Team%20061%20-%20Diabetes/Calamity.html

Real Problem: Are a family history of diabetes on the Navajo Nation and a poor diet highly correlated?  

Area of Science: Medicine and Health  

Background: With the tradition of oral histories with the Navajo people creating a survey is a great idea.  By using the free services at Zoomerang (http://www.zoomerang.com), the students could create, deploy and analyze their survey data.  Here is a site that helps design surveys: (http://www.statpac.com/surveys/).

Students will need to study statistics. Students want to look for patterns.  One key phrase that a statistician uses is correlation is not necessarily causation.  Just because A is correlated with b, doesn't meant that a causes b. 

There is no fixed number for how big the sample should be.  The larger the sample, the more information you would receive.  You want a broad sample, not just high school students, males, vegetarians, but randomly sample a large cross section of the population.

The team did go out and find two mentors who helped them with diabetes prevention.  I went a step further and contacted a science mentor and a programming mentor. I may now need a statistics mentor.

Why is this a Computational Science Project?   My science mentor said “a statistical model in a research project is a tool used to answer a question - the research hypothesis.” He continues to say you need enough data to have “predictive power. “  We are using a tool to answer a question; we are looking at numbers and results and see if they make sense.

Working Problem:  Two reasons to narrow the focus to the Navajo Nation are 1) more personally relevant to students and 2) Navajos are more accessible to the students.  But narrowing the study obscures some other questions like is the incident of diabetes of Navajos with a bad diet the same as non Navajos who have bad diet?  Or is there a   genetic predisposition towards diabetes?    These issues are relevant to the sampling procedure.

Math Model:  This project is based on statistics which might uncover some relationships which could eventually go into a math model, but “we are not going there this semester.”  My programming mentor, Drew Einhorn said, “A math model is where you take a problem, simply it, abstract various measurable facts, assume some kind of relationships between the various, measurable data, and look at data to calculate simple patterns.

 Computer Program:  Dealing with a limited data set is a method known as "bootstrapping", where you basically clone random observations from the original data set, run multiple regressions and average the results.  Here is some information on bootstrapping: http://www.sportsci.org/resource/stats/generalize.html.

This may not give you the predictive power of a large sample, but it allows you to develop and practice analytical methods.

A statistical method, multiple regression looks at the data with relationships, helping find patterns in the survey data. 

There are several ways to run multiple regressions: 1) Use a spreadsheet. Here is a link that explains how to estimate multiple regressions and interpret the data:

§ http://ageco.tamu.edu/faculty/fuller/regression.PDF

•A free statistical analysis packet (http://www.r-project.org/ ) and a tutorial on how to use with unsophisticated users at

My programming mentor did a search on Google for statistics software "open source.'

Open source is the current buzzword that includes free software.  It is free in the sense that you don't have to pay for it and also in a political and philosophical sense that the source code is open and shared and discussed and developed by a community, like Red Hat Linux who take open source software, add support and market it.

Here  is what the Google search found:
http://www.google.com/search?num=100&hl=en&lr=&ie=UTF-8&oe=UTF-8&safe=off&q=statistics+software+"open+source"&btnG=Google+Search

Then a few entries down the page Drew found:

[PDF]Using Open Source Software to Teach Mathematical Statistics
File Format: PDF/Adobe Acrobat - View as HTML
... Page 2. Outline Open Source Software for Statistics What is R? Obtaining and installing
R An Example - MLE Conclusions from the example Links Using Open Source ...
www.stat.wisc.edu/~bates/JSM2001.pdf - Similar pages

He followed that link because it looked interesting and he scrolled down to

How do I get R? R is an open source code of S, which is way to do statistics by John Chambers who created a commercial version. He then found www.r-project.org.

He then went to the Mail list page and subscribed to the listserv and asked
where to find an appropriate introduction/tutorial.

Drew also shared this resource page, Grades 9 -12, Data Analysis and Probability at

http://www.mste.uiuc.edu/stat/stat.html

and these resources too:

Here are some books on teaching statistics.

http://www.dickinson.edu/~rossman/ws/ <http://www.dickinson.edu/%7Erossman/ws/>

Here's a sample syllabus for teaching statistics to students with a high school algebra background using this book.

http://www.dickinson.edu/~rossman/ws/wssyllabus.html

Here is a website with other statistic projects:

http://www.aw-bc.com/weiss/e_iprojects/index.htm  

Results/Conclusions:

Tips from my science mentor, Hans Petersen, Lovelace Respiratory Research Institute:  A statistical model in a research project is a tool used to answer a question - the research hypothesis.  The hypothesis stated in the paper ("to inform...") does not lend itself directly to a statistical model approach. Nor does the obvious hypothesis ("do sloth and junk food lead to type II
diabetes?") because you wouldn't want any of the study subjects
to present a "positive" outcome.

Some questions: How do surveys really work?  How do politicians misuse surveys? 

Are these sources of data helpful?

New Mexico Diabetes Statistics

http://www.laplaza.org/health/dwc/patient/stats_nm_txt.html

Southwest Diabetes Prevention Center

http://hsc.unm.edu/ndpc/site.htm

The students state that the poor water quality on the Rez causes them to drink more soda and eat more junk food.  Interesting theory; how can they prove this?

StarLogo has a program that concretely shows how eating better and exercising can effect diabetes.