[ad_1]
We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
The era of Big Data has helped democratize information, creating a wealth of data and growing revenues at technology-based companies. But for all this intelligence, we’re not getting the level of insight from the field of machine learning that one might expect, as many companies struggle to make machine learning (ML) projects actionable and useful. A successful AI/ML program doesn’t start with a big team of data scientists. It starts with strong data infrastructure. Data needs to be accessible across systems and ready for analysis so data scientists can quickly draw comparisons and deliver business results, and the data needs to be reliable, which points to the challenge many companies face when starting a data science program.
The problem is that many companies jump feet first into data science, hire expensive data scientists, and then discover they don’t have the tools or infrastructure data scientists need to succeed. Highly-paid researchers end up spending time categorizing, validating and preparing data — instead of searching for insights. This infrastructure work is important, but also misses the opportunity for data scientists to utilize their most useful skills in a way that adds the most value.
When leaders evaluate the reasons for success or failure of a data science project (and 87% of projects never make it to production) they often discover their company tried to jump ahead to the results without building a foundation of reliable data. If they don’t have that solid foundation, data engineers can spend up to 44% of their time maintaining data pipelines with changes to APIs or data structures. Creating an automated process of integrating data can give engineers time back, and ensure companies have all the data they need for accurate machine learning. This also helps cut costs and maximize efficiency as companies build their data science capabilities.
Machine learning is finicky — if there are gaps in the data, or it isn’t formatted properly, machine learning either fails to function, or worse, gives inaccurate results.
When companies get into a position of uncertainty about their data, most organizations ask the data science team to manually label the data set as part of supervised machine learning, but this is a time-intensive process that brings additional risks to the project. Worse, when the training examples are trimmed too far because of data issues, there’s the chance that the narrow scope will mean the ML model can only tell us what we already know.
The solution is to ensure the team can draw from a comprehensive, central store of data, encompassing a wide variety of sources and providing a shared understanding of the data. This improves the potential ROI from the ML models by providing more consistent data to work with. A data science program can only evolve if it’s based on reliable, consistent data, and an understanding of the confidence bar for results.
One of the biggest challenges to a successful data science program is balancing the volume and value of the data when making a prediction. A social media company that analyzes billions of interactions each day can use the large volume of relatively low-value actions (e.g. someone swiping up or sharing an article) to make reliable predictions. If an organization is trying to identify which customers are likely to renew a contract at the end of the year, then it’s likely working with smaller data sets with large consequences. Since it could take a year to find out if the recommended actions resulted in success, this creates massive limitations for a data science program.
In these situations, companies need to break down internal data silos to combine all the data they have to drive the best recommendations. This may include zero-party information captured with gated content, first-party website data, and data from customer interactions with the product, along with successful outcomes, support tickets, customer satisfaction surveys, even unstructured data like user feedback. All of these sources of data contain clues if a customer will renew their contract. By combining data silos across business groups, metrics can be standardized, and there’s enough depth and breadth to create confident predictions.
To avoid the trap of diminishing confidence and returns from an ML/AI program, companies can take the following steps.
By building the right infrastructure for data science, companies can see what’s important for the business, and where the blind spots are. Doing the groundwork first can deliver solid ROI, but more importantly, it will set up the data science team up for significant impact. Getting a budget for a flashy data science program is relatively easy, but remember, the majority of such projects fail. It’s not as easy to get budget for the “boring” infrastructure tasks, but data management creates the foundation for data scientists to deliver the most meaningful impact on the business.
Alexander Lovell is head of product at Fivetran.
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.
If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing an article of your own!
Are you on a quest to discover the perfect scent that talks to your soul…
Hey there! If you've ever discovered for yourself slouching over a computer and feeling the…
Hey there, renovation enthusiasts as well as celebrity fans! Have you ever asked yourself what…
Madrid, a city that pulses with history, creativity, and also cultural dynamism, is a gem…
Hey there! So, you're interested in learning the Anjouan license, right? Well, you've come to…
Hey there, Toronto homeowners! If you're diving into a kitchen renovation, one of the most…