Share with your friends










Submit

Analytics Magazine

Bridging the data science gap

Given the shortage of qualified data scientists, companies are tapping the power of domain expertise such as engineers.

Seth DeLandBy Seth DeLand

Data science is one of the most in-demand fields today, and there is no shortage of opportunity for those with the requisite computer science and statistics skills. Since 2012, demand for data scientists has risen 650 percent, yet there are still only 35,000 qualified data scientists working in the United States today according to data from LinkedIn. Because many companies do not have the time or resources to properly onboard and train new data scientists, some businesses are getting creative by leveraging the value that their incumbent engineers can offer by tapping their domain expertise.

Engineers Unlock Data Science Tools

Domain expertise is the differentiator that will allow businesses to harness the power of data science by building on trusted relationships with existing engineers. After all, these highly skilled employees have spent countless hours designing the tools their organizations rely on every day, and their familiarity with the DNA of the business makes them suitable to develop data science programs that fit the unique anatomy of their organization.

To fully tap into the power of an engineer’s domain expertise, some companies have found that the use of big data, machine learning and software tools such as MATLAB can help reduce training times and optimize the onboarding process. Through these tools, engineers can explore different data environments, overlay different methodologies to identify inefficiencies or errors, and maximize productivity within a recurring system.

The integration of skill sets also enables engineers to establish more effective data analysis strategies. Rather than sorting through superfluous data sets, or establishing machine learning processes on inconclusive data, they can develop new algorithms that repeatedly analyze sourced data, saving hours of time and, potentially, hundreds of thousands of dollars.

New Frontier for Domain Experts

Several cases can be made for how best to leverage big data to maximize profitability within a specific industry. For example, the power of the Internet and the rise of the always-on consumer are allowing informed engineers to extract data from thousands of new sources and develop even more sophisticated demographics profiles.

The combination of domain expertise and data science can be applied to industries such as pharmaceuticals, where it can help companies decide between developing a more cost-effective version of a lifesaving drug or investing in new medicines. By being more adept at analyzing the data made available by the Internet, domain experts can determine outcomes such as whether patients are purchasing appropriate prescriptions, or if the cost is displacing sales. In turn, businesses can better understand at what quantities and what price points medications most profitable, while delivering them to a greater number of people.

Perhaps one of the most beneficial areas for an engineer to focus their domain expertise is on the manufacturing and production processes at their own company. While a data scientist can come in and examine the process for inefficiencies and mathematically pinpoint extraneous expenses, the engineer’s added business knowledge can help pinpoint program inefficiencies at a faster rate, diagnose why these problems exist, and even build new systems to improve the overall system design.

Technologies and Tools

The question now becomes, what technologies are best suited for implementing and optimizing domain expertise?

Hadoop and Spark – both extremely powerful systems within the Apache Software Foundation framework – allow data scientists to launch a complicated set of algorithms that perform analysis on large data sets, more commonly referred to as “big data.” Yet for those engineers that are accustomed to working with data in files on desktop machines, on network drives or in traditional databases, these new tools require a different way of accessing the data before analysis can even be considered. What’s more, with big data, there are often disc reads/writes, as well as data transfers across networks, which slow down computations for engineers if they are unfamiliar with the network.

Advanced modeling tools help to address these challenges by establishing connections to big data systems like Hadoop. Engineers can operate within a familiar environment when writing code that runs both on the data sample locally and on the full dataset in the big data system. With localized development of new algorithms, engineers can also apply the algorithm to a data analysis process that is common to their workflow, which is the key to reducing the time it takes for analysis of a full dataset.

Big data is a key enabler for implementing domain expertise in an existing workforce. In parallel, machine learning has become an essential part of the data collection process due to new algorithms that are needed to analyze the massive amount of data that’s readily available to engineers.

Whether the goal is to uncover relationships in data (unsupervised) or train a model that can predict new outcomes (supervised), there are hundreds of algorithms that can be used to develop specific models. Often the best way to figure out which algorithm will work for a problem is to simply test them out and compare results. Yet this can prove extremely challenging and time consuming for engineers that aren’t familiar with a wide variety of interfaces.

As with big data, software modeling tools address this problem by providing point-and-click apps that train and compare multiple machine learning models. This is critical for engineers transitioning into a data science role because it enables them to identify “quick wins” where machine learning is an improvement over the traditional iterative process. This approach also prevents them from spending days or weeks tuning a model to a dataset that is not well-suited to machine learning.

Big data and machine learning have always been poised to bring new solutions to long-standing business problems. Now, by coupling these technologies with the increased domain knowledge that engineers bring to the table, these tools can be taken far beyond traditional uses for web and marketing applications. Engineers will find they’re well-positioned to tackle problems that improve business outcomes at a broader level – a task once reserved for data scientists has been left in the capable hands of domain experts.

Seth DeLand is the product marketing manager, data analytics, MathWorks.

Related Posts

  • 100
    Data science is one of the most in-demand fields today, and there is no shortage of opportunity for those with the requisite computer science and statistics skills. Since 2012, demand for data scientists has risen 650 percent, yet there are still only 35,000 qualified data scientists working in the United…
    Tags: data, engineers, domain
  • 84
    With the rise of big data – and the processes and tools related to utilizing and managing large data sets – organizations are recognizing the value of data as a critical business asset to identify trends, patterns and preferences to drive improved customer experiences and competitive advantage. The problem is,…
    Tags: data
  • 81
    The Internet of Things (IoT) is considered to be the next revolution that touches every part of our daily life, from restocking ice cream to warning of pollutants. Analytics professionals understand the importance of data, especially in a complicated field such as healthcare. This article offers a framework on integrating…
    Tags: data
  • 77
    Businesses are greatly expanding the autonomous capabilities of their products, services and manufacturing processes to better optimize their reliability and efficiency. The processing of big data is playing an integral role in developing these prescriptive analytics. As a result, data scientists and engineers should pay attention to the following aspects…
    Tags: data
  • 75
    Frontline Systems releases Analytic Solver V2018 for Excel Frontline Systems, developer of the Solver in Microsoft Excel, recently released Analytic Solver V2018, its full product line of predictive and prescriptive analytics tools that work in Microsoft Excel. The new release includes a visual editor for multi-stage “data science workflows” (also…
    Tags: data


Headlines

Does negative political advertising actually work?

While many potential voters dread campaign season because of pervasive negative political advertising, a new study has found that negative political advertising actually works, but perhaps not in the way that many may assume. The study “A Border Strategy Analysis of Ad Source and Message Tone in Senatorial Campaigns,” which will be published in the June edition of INFORMS’ journal Marketing Science, is co-authored by Yanwen Wang of the University of British Columbia in Vancouver, Michael Lewis of Emory University in Atlanta and David A. Schweidel of Georgetown University in Washington, D.C. Read more →

Meet Summit, world’s most powerful, smartest scientific supercomputer

The U.S. Department of Energy’s Oak Ridge National Laboratory on June 8 unveiled Summit as the world’s most powerful and smartest scientific supercomputer. With a peak performance of 200,000 trillion calculations per second – or 200 petaflops – Summit will be eight times more powerful than ORNL’s previous top-ranked system, Titan. For certain scientific applications, Summit will also be capable of more than three billion billion mixed precision calculations per second, or 3.3 exaops. Read more →

Employee engagement a top concern affecting customer experience

Employee engagement has surfaced as a major concern in delivering improvements in customer experience (CX), with 86 percent of CX executives in a Gartner, Inc. survey ranking it as having an equal or greater impact than other factors such as project management and data skills. “CX is a people issue,” says Olive Huang, research vice president at Gartner. “In some instances, the best technology investments have been derailed by employee factors, such as a lack of training or incentives, low morale or commitment, and poor communication of goals." Read more →

UPCOMING ANALYTICS EVENTS

INFORMS-SPONSORED EVENTS

INFORMS Annual Meeting
Nov. 4-7, 2018, Phoenix

OTHER EVENTS

Advancing the Analytics-Driven Organization
July 16-19, noon-5 p.m.


Making Data Science Pay
July 30-31, 12:30 p.m.-5 p.m.


Predictive Analytics: Failure to Launch Webinar
Aug. 18, 11 a.m.


Applied AI & Machine Learning | Comprehensive
Sept. 10-13, 17-20 and 24-25

CAP® EXAM SCHEDULE

CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:


 
For more information, go to 
https://www.certifiedanalytics.org.