Ted Petrou, the founder of Dunder Data, and author of several Python data science books, is a well-known expert on data science and machine learning. Punchlist Zero caught up with him to talk about how data science is transforming manufacturing.
If I’m in the manufacturing space, why even care about analytics?
Dunder Data: A lot of industries are using rudimentary analytics to understand business. Being able to understand what is going on in your company in a visual, readable way is very important.
Think of the hospital experience. There are certain measurements that are checked routinely. A feedback loop is important. And lots of this is incorporated into CRM and Salesforce tools. Often advanced analytics isn’t even needed because out of box platforms can provide the needed intel.
So when does deep or machine learning come into play?
DD: Deep or machine learning should be one of the last optimized steps to gain market share. Once business fundamentals are in place, it’s time to look at machine learning. A lot of people think they should do machine learning because it’s a popular thing now. And it is not always a fit. In fact, it is easy to create negative value by using data improperly.
So give an example of an application that might be a fit for machine learning?
DD: A good use case would be predicting equipment lifetime. Same thing with tools and maintenance needs. Business owners should be cautious about taking results of a model and overapplying it to business scenarios. Intel should be incorporated. Big mistakes and poor inferences are easily drawn if the power of AI is not harnessed correctly.
For example, the Pyramids of Giza were the tallest until Eiffel Tower in 1889. If you were to model future-looking height predictions of building in 2500 B.C. you would have dramatically overestimated the rate of building height change. The moon program is another. We landed on the moon 50 years ago and haven’t gotten further. We went to the moon 6 times in 3 years. There is a limit to how much machine learning can help a company.
What are the steps for an organization that wants to apply machine learning to its operation?
DD: Before any machine learning is attempted, there must be data. All machine learning relies on having data. Systems for data collection, storage, retrieval, and maintenance must be robust and reliable for the data to be used for machine learning. Once your organization has a trusted system set up to manage the data, it is possible to attempt machine learning.
Begin with an extremely simple task that uses a small amount of data. Pick a task that requires supervised learning. These are tasks where there is a known output for every input. You will be able to grade your machine learning model’s performance on these tasks and be able to compare it to how a human being scored on the same task. I would complete the process of building and deploying an application that uses predictions from this simple task. If you can put together all these pieces, then you have proven that machine learning is a possibility for your organization. Develop each stage of this process more by either using more data, a more complex model, an entirely different task, or a more complex application.
PL0: What is your advice to a new engineer or potential student of machine learning?
Learn general programming. Understanding the fundamentals of programming and coupling that knowledge with basic statistics is very valuable. The practical next steps are to lean heavily on one of the libraries to start implementing it. I recommend Python w/ SciKitLearn which is really powerful and easy to get started with.
PL0: Tell us about Dunder Data and the plans for the future.
I am excited to continue to build on the great courses that Dunder is delivering. As such, we will be launching a platform that offers a comprehensive path to master data analysis, visualization, and machine learning using Python in the Spring of 2020.
Dunder Data was founded in 2017 to provide a clear and direct way to learn data science in the Python ecosystem. Dunder Data offers hundreds of pages of free content along with its renowned weeklong “Intro to Data Science” course.