• Designed relational database and integrated data into database by Python
• Extracted information with MySQL from a huge database of 50 million rows
• Conducted data analysis, including household behavior analysis and groceries shopping analysis
• Visualized the information using Python Matplotlib
• Gathered data from CDC and EPA website, cleaned data and plot descriptive statistics
• Research on the relationship between lung cancer incidence rate and environmental indicators using machine learning, including different classification and regression methods
• Insurance Market Analysis: Applied data cleaning process on over 1GB data to compare service and cost distribution and found the insurance companies with the highest market share in different state
• Insurance Claim Data Mining: Conducted cluster analysis on DRG (diagnosis-related groups) based on associated charges of PCCR(primary cost center) using k-means through Python
• Used Tableau to make Self-Service and Geo-Spatial analytics.
• Cleaned data with R to fit the format in Tableau
• The screenshots of Tableau dashboard and some introduction are included in the reports