Team: 30
School: La Cueva High
Area of Science: Mycology
Interim: Problem Definition:
Mushroom hunting is a beloved hobby, especially in Europe. Moreover, there are numerous possible health benefits due to the antioxidants found in certain mushrooms. But despite the existence of numerous datasets, many apps still misdiagnose common poisonous mushrooms as safe. This problem has negatively impacted the prospects of mushroom-hunting, where hikers are discouraged from picking mushrooms that are unfamiliar to them.
Problem Solution:
Several datasets document specific mushroom characteristics and the corresponding edibility of the Agaricus family, the most widely used being the dataset from the University of California Irvine. This dataset contains 8000 entries, each with a list of 23 mushroom characteristics. There are also documented online image databases and field guides.
To increase the accuracy of computationally identifying the edibility of a mushroom, my project will focus on optimizing the correlation between these datasets.
The goal is to take a picture of a mushroom, extract the significant characteristics, and predict the edibility from those characteristics. I plan to use the Wolfram Language, a flexible language with in-built machine learning functions.
Problem Progress:
I have extracted the relevant characteristics by analyzing the difference between the amount of poisonous and edible entries for a unique characteristic. Using the Wolfram Language, I created datasets of the correlation between all the characteristics themselves, and the combination of 2 characteristics.
To determine “significant†characteristics, there has to be at least 100% difference between the counts of edibility and non edibility. For example, if 80% of entries for cap-color blue are poisonous, then cap-color blue is a relatively high indicator. On the other hand, if 50% of entries for cap-color red are poisonous, then this characteristic gives no valuable information whatsoever, because cap-color red is equally likely to be poisonous as edible.
Problem Results:
Using the University of California-Irvine dataset, the most significant characteristic is odor, with an accuracy of almost 100%! Other high-predictors are cap bruises, gill size and color, stalk surface, and spore-print color. And the combination of two variables has even greater accuracy in predicting edibility, where variables such as veil-type, being insignificant by themselves, pair with other characteristics to produce much more accurate classifications! The most accurate combination so far is gill size and spore-print color. The main limitation with this approach is that some characteristics, that are high predictors of edibility, can often not be extracted from an image. For example, odor, cannot be detected based on an image.
Future Work:
My next step is extracting these relevant characteristics from an image. First, I plan to label images from an online image dataset with a corresponding characteristic. Then, for each characteristic, I’ll use a version of the Classify function (FeatureExtract) in the Wolfram Language to create a function that extracts this characteristic using my labeled training set. Using this approach, I’ll be able to tailor specific images to certain characteristics. For example, if extracting cap-color, there will be no need to include irrelevant images of mushroom gills or stalks in the training dataset.
References:
Al-Mejibli I., and Abd D. (2017). Mushroom Diagnosis Assistance System Based On Machine Learning by Using Mobile Devices. Journal of AL-Quaisiyah for computer science and mathematics. http://www.qu.edu.iq/journalcm/index.php/journalcm/article/view/319/289
Koivisto, T., et al. (2017). Deep Shrooms: classifying mushroom images. Github. https://tuomonieminen.github.io/deep-shrooms/
Shields, T. (2021). The best apps for mushroom identification (and why a book is better). FreshCap Mushrooms. https://learn.freshcap.com/tips/mushroom-identification-app/
Wolfram Language. (2021). Visualize a Dataset Using Feature Extraction. Wolfram. https://www.wolfram.com/language/11/improved-machine-learning/visualize-a-dataset-using-feature-extraction.html?product=language
Team Members:
Sponsoring Teacher: Yolanda Lozano