Groceries Shopping Behaivor Analysis

10/2019 - 11/2019
MySQL, Python (#Code)

• Designed relational database and integrated data into database by Python
• Extracted information with MySQL from a huge database of 50 million rows
• Conducted data analysis, including household behavior analysis and groceries shopping analysis
• Visualized the information using Python Matplotlib

Lung Cancer Mortality and Air Quality

01/2020 - 08/2020
R, Data Robot(#Code)

• Gathered data from CDC and EPA website, cleaned data and plot descriptive statistics
• Research on the relationship between lung cancer incidence rate and environmental indicators using machine learning, including different classification and regression methods

Healthcare Data Mining

10/2019 - 12/2019
R, Python (#Code)

• Insurance Market Analysis: Applied data cleaning process on over 1GB data to compare service and cost distribution and found the insurance companies with the highest market share in different state
• Insurance Claim Data Mining: Conducted cluster analysis on DRG (diagnosis-related groups) based on associated charges of PCCR(primary cost center) using k-means through Python

Information Visualization with Tableau

10/2019 - 12/2019
Tableau, R (#Code)

• Used Tableau to make Self-Service and Geo-Spatial analytics.
• Cleaned data with R to fit the format in Tableau
• The screenshots of Tableau dashboard and some introduction are included in the reports

