In this blog, I sure the different kinds of data scientists, and
how data science equally and overlaps with niche fields such as machine learning, deep learning, AI, statistics, IoT, operations research, and produced
mathematics. As data science is a broad discipline, I start by describing the
various types of data scientists that one may encounter in any business
setting: you might even discover that you are a data scientist yourself,
without knowing it. As in any scientific rule, data scientists may borrow
techniques from related disciplines, though we have built our arsenal,
especially skills and algorithms to manage very large unstructured data sets in
automated ways, even without human interactions, to perform transactions in
real-time or to make predictions.
1. Different Types of Data Scientists
To get moved and gain some historical viewpoint, you can read my
article about 9 kinds of data scientists, published in 2014, or my article
where I match data science with 16 analytic rules, also published in
2014.
The following articles, published during the same period, are
still useful:
- Data Scientist vs. Data Architect
- Data Scientist vs. Data Engineer
- Data Scientist vs. Statistician
- Data Scientist versus Business Analyst
More currently (August 2016) Ajit Jaokar announced Type A
(Analytics) versus Type B (Builder) data scientist:
The Type A Data expert can code well enough to work with data
but does not require an expert. The Type A data scientist may be an engineer in
trial design, prediction, modeling, statistical inference, or other things
commonly taught in statistics departments. Generally speaking though, the work
product of a data scientist is not "p-values and confidence
intervals" as academic statistics sometimes appear to suggest (and as it
sometimes is for conventional statisticians going in the pharmaceutical
industry, for example). At Google, Type A Data Scientists are known different
as Statistician, Quantitative Analyst, Decision Support Engineering Analyst, or
Data Scientist, and possibly a few more.
Type B Data Scientist: The B is for developing. Type B Data
Scientists portion some statistical background with Type A, but they are also a
very powerful developer and may be trained software engineers. The Type B Data
Scientist is primarily liked in using data "in production." They
develop models that connect with users, often providing recommendations
(result, people you may know, ads, movies, search results).
I also wrote about the ABCD's of business procedure company
where D stands for data science, C for computer science, B for business
science, and A for analytics science. Data science may or may not interact
programming or mathematical exercise, as you can read in my content on
low-level versus high-level data science. In a startup, data scientists wear
several hats, such as administrative, data miner, data engineer or architect,
researcher, statistician, modeler (as in predictive modeling) or developer.
While the data scientist is mostly portrayed as a coder skillful
in R, Python, SQL, Hadoop and statistics, this is just the tip of the iceberg,
made popular by data camps targeting on teaching some component of data
science. But just like a lab consultant can call herself a physicist, the real
physicist is much more than that, and her domains of skills are varied:
astronomy, mathematical physics, nuclear physics (which is borderline
chemistry), mechanics, electrical engineering, signal processing (also a
sub-field of data science) and many more. The same can be said about data
scientists: sectors are as varied as bioinformatics, information technology,
simulations and quality control, computational finance, epidemiology,
industrial engineering, and even number theory.
In my case, over the last 10 years, I expertise in
machine-to-machine and device-to-device communications, developing systems to
automatically process huge volume of data sets, to work automated transactions:
for quick, purchasing Internet traffic or automatically generating content. It
hinted developing algorithms that work with unregulated data, and it is at the
connection of AI (artificial intelligence,) IoT (Internet of things,) and data
science. This is assigned to as deep data science. It is relatively math-free,
and it involves comparably little coding (mostly APIs), but it is quite
data-intensive (including developing data systems) and depend on brand new
statistical technology deployed clearly for this context.
Before that, I created a credit card fraud threat in real-time.
Previous in my career (circa 1990) I treated on image remote sensing
technology, among other things to test layouts (or shapes or features, for
quick) in satellite images and to work image segmentation: at that time my
research was labeled as computational statistics, but the people doing the
related same thing in the computer science division next door in my home
university, called their research artificial intelligence. Nowadays, it would
be called data science or artificial intelligence, the sub-domains being signal
processing, computer vision or IoT.
Also, data scientists can be found anywhere in the lifecycle of
data science models, at the data-gathering stage, or the data trial stage, all
the way up to statistical modeling and maintaining existing systems.
2. Machine Learning versus Deep Learning
Before digging deeper into the link between data science and
machine learning, let's shortly announce machine learning and deep learning.
Machine learning is a set of innovations that train on a data set to make a
forecast or make plans to advance some systems. For instance, managed
classification algorithms are used to classify unique clients into good or bad
prospects, for loan purposes, based on historical data. The approach involved,
for a given task (e.g. supervised clustering), are varied: naive Bayes, SVM,
neural nets, ensembles, association rules, decision trees, logistic regression,
or a combination of many. For a detailed list of algorithms, click here. For a
list of machine learning problems, click here.
All of this is a subset of data science. When these algorithms
are automated, as in automated piloting or driver-less cars, it is called AI,
and more clearly, deep learning. Click here for another article comparing
machine learning with deep learning. If the data collected comes from sensors
and if it is conducted via the Internet, then it is machine learning or data
science or deep learning applied to IoT.
Some people have various definitions of deep learning. They
include deep learning as neural networks (a machine learning technique) with a
deeper layer. The question was asked on Quora newly, and below is a more
detailed explanation (source: Quora)
AI (Artificial intelligence) is a subfield of computer science,
that was created in the 1960s, and it was (is) concerned with solving tasks
that are easy for humans, but hard for computers. In particular, a so-called
Strong AI would be a system that can do anything a human can (perhaps without
purely physical things). This is fairly generic and includes all kinds of
tasks, such as planning, moving around in the world, recognizing objects and
sounds, speaking, translating, performing social or business transactions,
creative work (making art or poetry), etc.
NLP (Natural language processing) is the include of AI that has
to do with language (usually written).
Machine learning is distressed with one aspect of this: given
some AI problem that can be defined in discrete terms (e.g. out of a particular
set of actions, which one is the right one), and given a lot of detail about
the world, figure out what is the “correct” plan, without having the developer
compute it in. Normally some outside process is required to decide whether the
plan was correct or not. In mathematical terms, it’s a function: you feed in
some input, and you need it to generate the right output, so the whole problem
is simply to build a model of this mathematical function in some automatic way.
To distinguish with AI, if I can write a very clever program that has
human-like behavior, it can be AI, but unless its parameters are automatically
learned from data, it’s not machine learning.
Deep learning is one kind of machine learning that’s very
popular now. It involves a particular kind of mathematical model that can be
thought of like a constitution of basic blocks (function composition) of a few
types, and where some of these blocks can be adjusted to better predict the
outcome.
What is the difference between machine learning and statistics?
This article tries to answer the question. The author writes
that statistics is machine learning with confidence intervals for the
quantities being predicted or estimated. I tend to disagree, as I have built
engineer-friendly confidence intervals that don't require any mathematical or
statistical knowledge.
3. Data Science versus Machine Learning
Machine learning and statistics are a slice of data science. The
word learning in machine learning means that the algorithms be based on some data,
used as a training set, to fine-tune some model or algorithm parameters. This
encompasses many techniques such as regression, naive Bayes or supervised
clustering. But not all techniques fit in this category. For instance,
unsupervised clustering - a statistical and data science technique - aims at
detecting clusters and cluster structures without any a-priority knowledge or
training set to help the classification algorithm. A human being is needed to
label the clusters found. Some techniques are hybrid, such as semi-supervised
classification. Some pattern detection or density estimation techniques fit in
this category.
Data science is much more than machine learning though. Data, in
data science, may or may not come from a machine or mechanical process (survey
data could be manually collected, clinical trials involve a specific type of
small data) and it might have nothing to do with learning as I have just
discussed. But the main difference is the fact that data science covers the
whole spectrum of data processing, not just the algorithmic or statistical
aspects. In particular, data science also covers.
Of course, in many organizations, data scientists focus on only
one part of this process. To read about some of my original contributions to
data science
0 Comments