Received a question about data science and analytics, and thought it'd be interesting to post my response here as well for anyone who is interested.
Would you mind sharing a bit about what kind of things these corporations are looking for for their analytics teams? Do they want people with versatility across many statistical languages? Are they looking for advanced degrees in statistics/CS?
The term 'data science' is very ambiguous and there isn't a widely accepted definition of what it is. With that said, I use the term broadly to encompass all positions where an employee uses various technical tools and inferential statistics to analyze data. Obviously, not everyone who fits this definition is a real "data scientist." For example, a financial analyst who uses Excel to develop financial models is not a 'data scientist,' regardless of how much math they know. Ultimately, I know a data scientist by the technical and statistical tools that they use. Another important thing to remember is that many companies employ "data scientists" while other businesses have people with similar skills but possess job titles such as quantitative analyst, statistical analyst, data modeler, etc. By and large, I would consider all of these people 'data scientists' given that they utilize inferential statistics and programming languages to perform data munging and predictive modeling. One thing to remember is that this is the minimum that one should know, and given that many of these data scientists are working on problems requiring some knowledge of machine learning, it's better that one knows more than just basic hypothesis testing.
By and large, I like Drew Conway's attempt at explaining what data science is. However, I think one would need to know much more than ols regression as the problems you encounter in the real world are much more complex than what could be explained by a simple general linear model.
In regards to what corporations are looking for in their analytics teams, I suggest looking 'data scientist' job posting to gauge what type of skills they want. They vary quite drastically, but here are a few examples.
Data Scientist @ Sears
Desired Skills & Experience
- MS/PhD. degree in Statistics, Mathematics, Physics, Operations Research, CS, Econometrics or equivalent/related degree.
- 2+ years relevant experience with a proven track record of leveraging data analysis to drive significant business impact preferred.
- Above average capabilities with basic analysis tools of Microsoft Office (Excel, Power Point, Outlook, Word, & Access).
- Ability to quickly learn and gain deep understanding of SHC business processes.
- Ability to perform thorough analysis of complex data, draw sound conclusions, and devise actionable strategies.
- Expertise in predictive modeling.
Must have knowledge/experience in some/all of the following: Multivariate Regression, Logistic Regression, Support Vector Machines, Bootstrap Aggregation, Boosting, Decision Trees, and Time Series Analysis.
- Experience in Optimization, Stochastic Processes, Experiment Analysis, and/or Bootstrapping a plus.
- Experience in Hadoop-based tools (Mahout, Hive, Pig) a plus.
- Proficiency in at least one statistical analysis tools such as R, SAS, or KXEN.
- SQL proficiency a plus.
- Ability to prioritize and execute multiple tasks in a highly dynamic environment.
- Detail oriented with proven analytical, problem identification and resolution skills.
- Ability to work effectively in an unstructured and fast-paced environment, and have a high degree of self-management.
- Proven interpersonal, communication and presentation skills – must be able to explain technical concepts and analysis implications clearly to a wide audience.
- Extremely detail oriented with superior analytical, problem solving, pragmatism and organization skills.
- Bachelor’s degree, or foreign equivalent, in Computer Science, Engineering, Statistics or Mathematics or a closely related discipline, five (5) years of experience in the Job Offered or over four (4+) years of experience in Analytics, in Actuary roles, as a statistician, or as a Data Miner.
- Masters in Computer Sciences, Engineering, Statistics, Mathematics or similarly related discipline.
Data Scientist @ YP
- Passion and deep technical competency in mathematical modeling, statistics and business analytics.
- Data Warehousing, Business Intelligence and Analytics experience, preferred
- Demonstrated professional experience with deterministic and probabilistic statistical methods and a proven history of statistical inference on large scale data.
- Proficiency in at least one of the following statistical toolkits: R, SPSS, Matlab, Mahout, SAS
- Programming experience in at least one of the following languages: C/C++, Java, Python or Perl and a high level of skill in SQL.
- Interface with the PDS team and business analysts and leaders to produce required data, models and analysis working in an agile environment working with cross-functional stakeholders and owners.
- Strength in distributed computing platforms and large scale data processing/computing environments with experience using Apache Hadoop projects such as Hadoop, HBase, Pig, Hive, a plus
- Proficient with mathematical and statistical methods and techniques including regression, A\B testing, cluster analysis, monte carlo simulation, neural networks, decision trees, and principal component analysis, and predictive modeling
- Proven track record of managing projects and deliverables across highly cross-functional teams.
BS or MS in statistics, mathematics or other related discipline.
Data Scientist @ Twitter
Conduct statistical analyses to learn from and scale to petabytes of data
Use Map-Reduce frameworks such as Pig and Scalding, statistical software such as R, and scripting languages like Python and Ruby
Write and interpret complex SQL queries for standard as well as ad hoc data mining purposes
Communicate findings to product, engineering, and management teams
MS or PhD in Statistics, Math, Engineering, Operations Research, Computer Science, or another quantitative discipline.
Experience with statistical programming environments like R or Matlab
Experience with scripting languages (Python and/or Ruby), regular expressions, etc.
Interest in using discrete math, probability, and statistics to answer complex questions
Quantitative Analyst @ Google
- Apply advanced statistical methods
- Work with large, complex data sets
- Solve difficult, non-routine problems
- Clearly communicate highly technical results and methods
- Interact cross-functionally with a wide variety of people and teams
- Minimum Qualifications
- PhD in Statistics or Econometrics, (In lieu of degree, 4 years of relevant work experience).
- 2 years of relevant work experience.
- Experience with R/SPlus; coursework in Bayesian methods, longitudinal analysis and experimental design.
- Preferred Qualifications
- 3 years of relevant work experience.
- Experience with Python, Perl and SQL.
Statistical Analyst @ Inte Q
Duties and Responsibilities:
- Develop predictive models in support of email and direct marketing campaigns
- Develop attrition and activation models based on client/program specific needs
- Develop customer segmentation for management of retail client loyalty/relationship marketing portfolios. Work with internal and external business partners to develop marketing strategies for these segments
- Design controlled experiments that can be used to measure the changes in customer behavior across many treatment groups
- Design and measure customer loyalty programs
- Consult with Business Partners and clients on analytic methodologies for customer analyses and present and explain results in a clear precise manner
- Develop customer scorecards for reporting of customer metrics and performance of loyalty/customer relationship programs. Must be able to define and implement flexible customer reporting
- Desired Skills & Experience
- Good oral and written skills are mandatory
- Prior experience using campaign management software (Unica, Campaign Runner)
- Need to be able to translate customer requests into an analytic framework
- 2 – 5 years’ work experience in a retail, CRM , credit or consulting environment would be preferred but is not mandatory
- Bachelor or Master’s degree in Statistics, Mathematics or other quantitative discipline
- Exposure to large relational databases and SQL is important
- Strong programming skills in SAS required
- Consulting experience or training in structured methodologies advantageous
For anyone curious about the problems that data scientists/quantitative analysts/statistical analysts work with, here's a presentation that might be helpful.
In regards to "I had assumed that all this time spent with SAS, etc was a huge waste because all I would be able to do is data analytics." Working in analytics or data science is a fairly specialized skill and I hope you don't view it as just doing data analytics. In your work experience, I'm sure you've experienced the analytical incompetence of most business people, and those who work in marketing or finance. I view the data science work that I do as using statistics to produce new insights that can help the "blind" people who have traditionally worked in the marketing and finance sectors. We're helping these people gain a deeper understanding of their industries, even though we're "new" and those business persons have years of "experience." I don't give a shit about how many years of experience a person has, data/stats > experience. That is what sets you apart and why your work shouldn't be characterized as a 'waste of time' or as just data analytics. Of course, you'll find that a lot of business people don't care about statistics, so they'll likely ignore your work and insights. That is the only reason I characterized my past two years as a waste of time. Because I was at a marketing company that claimed to be serious about quantitative analysis, but they simply weren't and ended up wasting my time and skills.
Edited by amathew - 7/31/13 at 3:36pm