Are Data Scientists Becoming Obsolete?
This question is raised on occasion. Salaries are not increasing as fast as they used to, though this is natural for any discipline reaching some maturity. Some job seekers claim it is not that easy anymore to find a job as a data scientist. Some employers have complained about the costs associated with a data science team, and ROI expectations not being met. And some employees, especially those with a PhD, complained that the job can be boring.
I believe there is some truth to all of this, but my opinion is more nuanced. Data scientist is a too generic keyword, and many times not even related to science. I myself, about 20 years ago, experienced some disillusion about my job title as a statistician. There were so many promising paths, but the statistical community, in part because of the major statistical associations and academic training back then, missed some big opportunities, focusing more and more on narrow areas such as epidemiology or census data, but failing to catch on serious programming (besides SAS and R) and algorithms. I was back then working on digital image processing, and I saw the field of statistics missing the machine learning opportunity and operations research in particular. I eventually called myself a computational statistician: that’s what I was doing, and it was getting more and more different from what my peers were doing. I am sure by now, statistics curricula have caught up, and include more machine learning and programming.
More recently, I called myself data scientist, but today, I think it does not represent well what I do. Computational or algorithmic data scientist would be a much better description. And I think this applies to many data scientists. Some, focusing more on the data aspects, could call themselves data science engineers or data science architects. Some may find the word business data scientist more appropriate. Junior ones are probably better defined as analysts.
Some progress has been made in the last 5 years for sure. Applicants are better trained, hiring managers are more knowledgeable about the field and have more clear requirements, and applicants have a better idea as to whether an advertised position is as interesting as it sounds in the description. Indeed, many jobs are filled without even posting a job ad, by directly contacting potential candidates that the hiring manager is familiar with, even if by word-of-mouth only. While there is still no well-known, highly recognized professional association (with a large number of members) or well-known, comprehensive certification for data scientists as there is for actuaries (and I don’t think it is needed), there are more clear paths to reaching excellence in the profession, both as a company or as an employee. A physicist familiar with data could easily succeed with little on-the-job practice. There are companies open to hiring people from various backgrounds, which broadens the possibilities. And given the numerous poorly solved problems (they pop up faster than they can properly be solved), the future looks bright. Examples include counting the actual number of people once infected by Covid (requiring imputation methods) which might be twice as high as official numbers, assessing the efficiency of various Covid vaccines versus natural immunization, better detection of fake reviews / recommendations or fake news, or optimizing driving directions from Google map by including more criteria in the algorithm and taking into account HOV lanes, air quality, rarity of gas stations, and peak commute times (more on this in my next article about my 3,000 miles road trip using Google navigation).
Renaissance Technologies is a good example: they have been working on quantitative trading since 1982, developing black-box strategies for high frequency trading, and mastering trading cost optimization. Many times, they had no idea and did not care why their automated self-learning trading system made some obscure trades (leveraging volatile patterns undetectable by humans or unused by competitors), yet it is by far the most successful hedge fund of all times, returning more than 66 percent annualized return (that is, per year, each year on average) for about 30 years. Yet they never hired traditional quants or data scientists, though some of their top executives came from IBM, with a background in computational linguistics. Many core employees had backgrounds in astronomy, physics, dynamical systems, and even pure number theory, but not in finance.
Incidentally, I have used many machine learning techniques and computational data science, processing huge volumes of multivariate data (numbers like integers or real numbers) with efficient algorithms, to try to pierce some of the deepest secrets in number theory. So I can easily imagine that a math background, especially one with strong experimental / probabilistic / computational number theory, where you routinely uncover and leverage hard-to-find patterns in an ocean of seemingly very noisy data behaving worse than many messy business data sets (indeed dealing with chaotic processes), would be helpful in quantitative finance, and certainly elsewhere like fraud detection or risk management. I came to call these chaotic environments as gentle or controlled chaos, because in the end, they are less chaotic than they appear to be at first glance. I am sure many people in the business world can relate to that.
The job title data scientist might not be a great title, as it means so many things to different people. Better job titles include data science engineer, algorithmic data scientist, mathematical data scientists, computational data scientist, business data scientist, or analyst, reflecting the various fields that data science covers. There are still many unsolved problems, the list growing faster than that of solved problems, so the future looks bright. Some such as spam detection, maybe even automated translation, have seen considerable progress. Employers and employees have become better at matching with each other, and pay scale may not increase much more. Some tasks may disappear in the future, such as data cleaning, replaced by robots. Even coding might be absent in some jobs, or partially automated. For instance, the Data Science Central article that you read now was created on a platform in 2008 (by me, actually) without a single line of code. This will open more possibilities, as it frees a lot of time for the data scientist, to focus on higher level tasks.
To receive a weekly digest of our new articles, subscribe to our newsletter, here.
About the author: Vincent Granville is a data science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at DataShaping.com, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target). You can access Vincent’s articles and books, here. A selection of the most recent ones can be found on vgranville.com.