Thursday, October 29

Data Science

Data Science

What I learned from looking at 200 machine learning tools
Data Science

What I learned from looking at 200 machine learning tools

By Chip Huyen, a writer and computer scientist, currently at an ML startup in Silicon Valley. To better understand the landscape of available tools for machine learning production, I decided to look up every AI/ML tool I could find. The resources I used include: After filtering out applications companies (e.g., companies that use ML to provide business analytics), tools that aren’t being actively developed, and tools that nobody uses, I got 202 tools. See the full list. Please let me know if there are tools you think I should include but aren’t on the list yet! Disclaimer This list was made in November 2019, and the market must have changed in the last 6 months. Some tech companies just have a set of tools so large that I can’t enumerate them all. For example, Amazon ...
KDnuggets™ News 20:n28, Jul 22: Data Science MOOCs are too Superficial; The Bitter Lesson of Machine Learning
Data Science

KDnuggets™ News 20:n28, Jul 22: Data Science MOOCs are too Superficial; The Bitter Lesson of Machine Learning

Features |  News |  Tutorials |  Opinions |  Tops |  Jobs  |  Submit a blog  |  Image of the week This week on KDnuggets: Data Science MOOCs are too Superficial; The Bitter Lesson of Machine Learning; Building a REST API with Tensorflow Serving; 3 Advanced Python Features You Should Know; Understanding How Neural Networks Think; and much more.   Features   News   Tutorials, Overviews   Opinions   Top Stories, Tweets     Jobs   Image of the week From Data Science MOOCs are too Superficial
NBA 2020 Statistics – Efficiency and Custom Fantasy Values
Data Science

NBA 2020 Statistics – Efficiency and Custom Fantasy Values

Motivation I have been playing NBA fantasy basketball for almost 2 decades. Each year a group of my friends, my friends' friends and I join a league or a number of leagues and act as the general manager for our own teams.  At the start of each fantasy league we would have an online draft and pick our players for our teams. We play a format called Head to Head where you basically compare your team's accumulated statistics vs your opponent's team's statistics in 9 different categories. You win the weekly match up if you win in 5/9 or more categories. Based on this format, team owners strategically build their team to be stronger on some categories. It really depends on the owner's strategy. During a fantasy season, we constantly look at the average statistics of players around the leag...
Your Next Trip in the US: Categorized Intelligently
Data Science

Your Next Trip in the US: Categorized Intelligently

GitHub Repository To top off our intensive three-month-long experience at NYCDSA, we chose to assist a US travel recommendation start-up by implementing intelligence and automation into their operations, to save hours of manual work. The company targets prospective travelers living in the US who know approximately when they’d like to take a trip within the country, but would like to be recommended exact location and itinerary ideas, rather than already know where exactly they’d like to go. Similar to Netflix, the company’s website is designed to get to know users so that curated recommendations can be given to them for their vacations. Upon navigating to the website, users fill out a short quiz, including prompts that ask them to select which images inspire them and what kinds of ...
Scraping TED Talks: Trends in Global Issues, Science, and Technology
Data Science

Scraping TED Talks: Trends in Global Issues, Science, and Technology

"Scraping TED Talks" is a longitudinal examination of trendiness of TED Talks on global issues, science, and technology. After extracting and transforming unstructured data from multimedia content, different methods and different measures of trendiness were used to inform analysis. Taken together, both methods reveal different sides of the story behind the numbers, as well as the evolution of trends. A composite measure of trendiness was constructed to gain a deeper understanding of the overall trending landscape. Quick Links: GitHub | Primary Data | Portfolio Note: Parentheses indicate estimated length in minutes ("min") or seconds ("sec") Length: 2 min After scraping, transforming, and analyzing unstructured data from TED Talks in global issues/technology and scie...
Southwest Airlines Phone Number
Data Science

Southwest Airlines Phone Number

Why We Recommend Southwest Airlines Phone Number? Our experts provide excellent services and making a trustworthy and healthy relationship among the customers. So, if you want to experience something new, be a part of services, book your flight tickets by dialing Southwest Airlines Phone Number. You can dial Southwest Phone Number to know about anything which you need to know. & find out more about new destinations, services & you can ask anything after all this places the exact place you are looking for! Mini-Guidance For Your Journey With Southwest! Experiencing hassle? People who are getting worried regarding air journey issues can contact the team. When you call us at Southwest Phone Number, we will give you the range of mini-guide to ease your nerve and our guide ...
Tackling Tenant Harassment in New York City: 
A Data-Driven Approach
Data Science

Tackling Tenant Harassment in New York City: 
A Data-Driven Approach

Jerica Copeny, Samantha Fu, Rebecca Johnson, and Teng Ye Tackling Tenant Harassment in New York City: 
A Data-Driven Approach This summer, our team of Data Science for Social Good fellows at the University of Chicago has partnered with the New York City Mayor’s Public Engagement Unit (PEU) with the goal of helping them better target their outreach to tenants who may be experiencing housing-related issues (ranging from eviction to repairs to landlord harassment). New York City passed pioneering legislation guaranteeing low-income New Yorkers a right to free counsel in housing court, but these rights are hollow if at-risk tenants don’t know they have them. In 2015, a Tenant Support Unit was created within PEU to conduct proactive outreach to tenants to educate them about t...
Improving Traffic Safety in Jakarta Through Video Analysis
Data Science

Improving Traffic Safety in Jakarta Through Video Analysis

João Caldeira, Alex Fout, Aniket Kesari, Raesetje Sefala UPDATE: We are pleased to announce that this project team won a Highlighted Paper Award at the AI For Social Good NIPS2018 Workshop! Congratulations to the Jakarta Fellows! Improving Traffic Safety in Jakarta Through Video Analysis The World Health Organization (WHO) estimates that over 1.25 million people die each year in traffic accidents. Nearly 2000 such fatalities occur annually in Jakarta, Indonesia alone, making it one of the most dangerous cities in the world for traffic safety. These deaths are tragic, but many of them are preventable through effective city planning. This summer, our team at the Data Science for Social Good Fellowship (DSSG) at the University of Chicago set out to help the city of Jakart...
Improving Workplace Safety in Chile through Proactive Inspections
Data Science

Improving Workplace Safety in Chile through Proactive Inspections

Improving Workplace Safety in Chile through Proactive Inspections Every year, thousands of Chileans are killed or injured in work-related accidents. This was recently brought to light during the 2010 Copiapó mining accident. Chile’s labor ministry, Dirección del Trabajo (DT), is tasked with increasing workplace safety through inspections and enforcement. But DT’s inspections are largely reactive: complaints come in and then an inspection is completed, often after an injury or death. Preventative inspections can help find safety issues before bad things happen. DT has started moving to preventative inspections. They hired data scientists, built some models, and even ran field trials. But their efforts ran into many challenges, ranging from having data on what labor facilities even ex...
Top 10 ways your Machine Learning models may have leakage
Data Science

Top 10 ways your Machine Learning models may have leakage

Rayid Ghani, Joe Walsh, Joan Wang If you’ve ever worked on a real-world machine learning problem, you’ve probably introduced (and hopefully discovered and fixed) leakage into your system at some point. Leakage is when your model has access to data at training/building time that it wouldn’t have at test/deployment/prediction time. The result is an overoptimistic model that performs much worse when deployed. The most common forms of leakage happen because of temporal issues – including data from the future in your model because you have that when you’re doing model selection but there are many other ways leakage gets introduced. Here are the most common ones we’ve found working on different real-world problems over the last few years. Hopefully, people will find this us...