A curated collection of structured datasets for your data analysis, machine learning, and visualization projects.
Disclaimer: These datasets were programmatically scraped by Jon Wayland with no assurances to their accuracy and may contain errors. Please use them for recreational and educational purposes only and do not depend on their accuracy for critical applications.
Click on a dataset to preview or download. All datasets are provided in CSV format.
List of Country Music Hall of Fame Inductees (1 row per individual), the year they were inducted, their birth date, their date of death if applicable, current age or age at death, and living status.
List of American Desserts and their respective ingredients in a comma-separated string
List of all Harry Potter characters, a description of the character, whether they were killed in the series, and who their killer was.
List of all US Hurricanes, their Saffir Simpson Category, the state they impacted, the date they formed, the date they became extra tropical, the date they dissipated, the number of estimated fatalities, and the estimated damage in USD
List of unicorn companies and their respective founders including the company name, industry, valuation amount, valuation date, exit date & reason, the valuation at time of exit, the country, the status, and the founder's name. There is 1 row per founder..
List of Academy Award wiinning films, nominations, awards, budget, box office, director, running time, country, language, and release date.
These datasets are entirely synthesized to represent realistic data in domains where real data is too private or restricted to access publicly. They're designed for practicing data analysis, visualization, and modeling in sensitive domains like healthcare.
Synthetic population health dataset that contains chronic conditions, copay amounts, medical costs, risks, and copay plan amounts at the individual level. This data reflects what health insurance companies analyze in their advanced data departments.
Synthetic healthcare dataset that includes ER utilization, cost percentile, member ID, and subsequent severity level.