With permission, turning Shiprock High Team 61, AiS Challenge project, Diabetes Calamity Type II research paper into a computational science project.
Visit and study their research (think more resources need
to be added) http://mode.lanl.k12.nm.us/~chtvanob/SuperComputing/Team%20061%20-%20Diabetes/Calamity.html
Real Problem: Are a family history of diabetes on the Navajo Nation and a poor diet highly correlated?
Area of Science: Medicine and Health
Background: With the tradition of oral
histories with the Navajo people creating a survey is a great idea. By using the free services at Zoomerang (http://www.zoomerang.com),
the students could create, deploy and analyze their survey data. Here is a site that helps design surveys:
(http://www.statpac.com/surveys/).
Students will need to study statistics.
Students want to look for patterns. One
key phrase that a statistician uses is correlation is not necessarily
causation. Just because A is correlated
with b, doesn't meant that a causes b.
There is no fixed number for how big the
sample should be. The larger the sample,
the more information you would receive.
You want a broad sample, not just high school students, males,
vegetarians, but randomly sample a large cross section of the population.
The team did go out and find two mentors who helped them with diabetes prevention. I went a step further and contacted a science mentor and a programming mentor. I may now need a statistics mentor.
Why is this a Computational Science Project? My science mentor said a statistical model in a research project is a tool used to answer a question - the research hypothesis. He continues to say you need enough data to have predictive power. We are using a tool to answer a question; we are looking at numbers and results and see if they make sense.
Working Problem: Two reasons to narrow the focus to the Navajo Nation are 1) more personally relevant to students and 2) Navajos are more accessible to the students. But narrowing the study obscures some other questions like is the incident of diabetes of Navajos with a bad diet the same as non Navajos who have bad diet? Or is there a genetic predisposition towards diabetes? These issues are relevant to the sampling procedure.
Math Model: This project is based on statistics which might uncover some relationships which could eventually go into a math model, but we are not going there this semester. My programming mentor, Drew Einhorn said, A math model is where you take a problem, simply it, abstract various measurable facts, assume some kind of relationships between the various, measurable data, and look at data to calculate simple patterns.
Computer Program: Dealing with a limited data set is a method known as "bootstrapping", where you basically clone random observations from the original data set, run multiple regressions and average the results. Here is some information on bootstrapping: http://www.sportsci.org/resource/stats/generalize.html.
§ http://ageco.tamu.edu/faculty/fuller/regression.PDF
A free statistical analysis packet (http://www.r-project.org/
) and a tutorial on how to use with
unsophisticated users at
My
programming mentor did a search on Google for statistics software "open
source.'
Open
source is the current buzzword that includes free software. It is free in the sense that you don't have
to pay for it and also in a political and philosophical sense that the source
code is open and shared and discussed and developed by a community, like Red
Hat Linux who take open source software, add support and market it.
Here is what the Google search found:
http://www.google.com/search?num=100&hl=en&lr=&ie=UTF-8&oe=UTF-8&safe=off&q=statistics+software+"open+source"&btnG=Google+Search
Then a few entries down the page Drew found:
[PDF]Using Open Source Software to Teach
Mathematical Statistics
File Format: PDF/Adobe Acrobat - View
as HTML
... Page 2. Outline Open Source Software for Statistics What is R? Obtaining and
installing
R An Example - MLE Conclusions from the example Links Using Open Source ...
www.stat.wisc.edu/~bates/JSM2001.pdf - Similar
pages
He followed that link because it looked interesting and he scrolled down to
How do I get R? R is an open source code of S, which is way to do statistics by
John Chambers who created a commercial version. He then found www.r-project.org.
He then went to the Mail list page and subscribed to the listserv and asked
where to find an appropriate introduction/tutorial.
Drew also shared this resource page, Grades 9 -12, Data Analysis and Probability at
http://www.mste.uiuc.edu/stat/stat.html
and these resources too:
Here are some books on teaching statistics.
http://www.dickinson.edu/~rossman/ws/
<http://www.dickinson.edu/%7Erossman/ws/>
Here's a sample syllabus for teaching statistics to students with a high school
algebra background using this book.
http://www.dickinson.edu/~rossman/ws/wssyllabus.html
Here is a website with other statistic projects:
http://www.aw-bc.com/weiss/e_iprojects/index.htm
Results/Conclusions:
Tips from my science mentor, Hans Petersen,
Lovelace Respiratory Research Institute:
A statistical model in a research project is a tool used to answer a
question - the research hypothesis. The
hypothesis stated in the paper ("to inform...") does not lend itself
directly to a statistical model approach. Nor does the obvious hypothesis
("do sloth and junk food lead to type II
diabetes?") because you wouldn't want any of the study subjects
to present a "positive" outcome.
Some questions: How do surveys really work? How do politicians misuse surveys?
Are these sources of data helpful?
New Mexico Diabetes Statistics
http://www.laplaza.org/health/dwc/patient/stats_nm_txt.html
Southwest Diabetes Prevention Center
http://hsc.unm.edu/ndpc/site.htm
The students state that the poor water quality on the Rez causes them to drink more soda and eat more junk food. Interesting theory; how can they prove this?
StarLogo has a program that concretely shows how eating better and exercising can effect diabetes.