Characterization of health apps
Table of contents
Scope of the project
Given the huge number of health-related Apps available online and given their increasing relevance for healthcare purposes, there is the necessity to understand which and how many Apps are nowadays useful for the user, and to understand how they can improve the quality of life of the patients.
Brief introduction to methods
The project was carried out entirely using “R”, chosen because of its numerous and versatile packages.
For this project, a combination of web-scraping, text mining and machine learning techniques have been used.
Project outline
The project outline is divided into two parts:
1. Identification and data gathering
The goal was to identify all the Apps related to the “Medical” and “Health and Fitness” categories and to extract information from the Apps webpages in an automated way, collecting them into a database.
For this part we used the R web-scraping oriented package “Rvest” that makes it easy to scrape (or harvest) data from html web pages.
2. Characterization and classification of health apps
The second part’s aim was to characterize all the Apps retrieved, classifying them into the specific medical categories they belong to, and then, to analyze the Apps features by mean of the newest and most relevant methods available from the recent literature.
For this part we used a combination of text mining and machine learning, developing a naive bayes classifier that was used to separate medical from not-medical apps. This separation allowed us to have a lighter database of apps that was successively analyzed using MetaMap, a highly configurable program to map biomedical text to the UMLS Metathesaurus or, equivalently, to discover Metathesaurus concepts referred to in text.
Full pdf report
You can read more about this project in the full pdf report available here