Free Datasets For Your Data Science Projects

A curated collection of structured datasets for your data analysis, machine learning, and visualization projects.

Disclaimer: These datasets were programmatically scraped by Jon Wayland with no assurances to their accuracy and may contain errors. Please use them for recreational and educational purposes only and do not depend on their accuracy for critical applications.

Available Datasets

Click on a dataset to preview or download. All datasets are provided in CSV format.

Country Hall of Fame Data

List of Country Music Hall of Fame Inductees (1 row per individual), the year they were inducted, their birth date, their date of death if applicable, current age or age at death, and living status.

American Desserts

List of American Desserts and their respective ingredients in a comma-separated string

Harry Potter Characters

List of all Harry Potter characters, a description of the character, whether they were killed in the series, and who their killer was.

US Hurricanes

List of all US Hurricanes, their Saffir Simpson Category, the state they impacted, the date they formed, the date they became extra tropical, the date they dissipated, the number of estimated fatalities, and the estimated damage in USD

Unicorn Founders

List of unicorn companies and their respective founders including the company name, industry, valuation amount, valuation date, exit date & reason, the valuation at time of exit, the country, the status, and the founder's name. There is 1 row per founder..

Academy Award Winning Films

List of Academy Award wiinning films, nominations, awards, budget, box office, director, running time, country, language, and release date.

Fictional Practice Datasets

These datasets are entirely synthesized to represent realistic data in domains where real data is too private or restricted to access publicly. They're designed for practicing data analysis, visualization, and modeling in sensitive domains like healthcare.

All data in this section is fictional and created for educational purposes only. While the patterns and relationships resemble real-world data, no real individuals or cases are represented.

Population Health

Synthetic population health dataset that contains chronic conditions, copay amounts, medical costs, risks, and copay plan amounts at the individual level. This data reflects what health insurance companies analyze in their advanced data departments.

Emergency Room

Synthetic healthcare dataset that includes ER utilization, cost percentile, member ID, and subsequent severity level.