Dismiss Notice

STYLE. COMMUNITY. GREAT CLOTHING.

Bored of counting likes on social networks? At Styleforum, you’ll find rousing discussions that go beyond strings of emojis.

Click Here to join Styleforum's thousands of style enthusiasts today!

Statistics, Data Science, and Data Mining Discussion Thread (Business Intelligence, Analytics, etc)

Discussion in 'Business, Careers & Education' started by amathew, Mar 8, 2014.

  1. amathew

    amathew Senior member

    Messages:
    1,532
    Likes Received:
    226
    Joined:
    Nov 4, 2011
    Location:
    KS => CO => MN => CA
    Let's talk statistics, 'big data', and data mining in here.

    What kind of work do you do? What problems do you work on? What tools do you use? Random thoughts? Book suggestions or blog posts? etc
    Whatever, as long it's related to statistics or statistical computing.

    I'm currently working on forecasting leads and sales for an automobile manufacturer and also on trying to apply association rule algorithms to clickstream data to identify common trends in consumer browsing behavior on a website. Besides that, I do a lot of natural language processing of survey verbatims for the purpose of classification and extracting common theme in those different classifications.

    By and large, I use R, MySQL, and Python for all my analysis. Occasionally, I'll use Tableau for creating visualizations. In my old job, had some Hadoop and NoSql exposure but I'm now working with much smaller data sets (3 to 5 gb data files). I'd much rather work with 'small data' than 'big data.'

    Blog posts I'm enjoying:
    http://prdeepakbabu.wordpress.com/2010/02/24/association-rule-mining/
    http://blog.revolutionanalytics.com/2014/03/r-and-hidden-markov-models.html
     
    Last edited: Mar 12, 2014


  2. Reggs

    Reggs Senior member

    Messages:
    5,571
    Likes Received:
    387
    Joined:
    Mar 11, 2006
    Location:
    The Internet
    I work in marketing and have an old stats textbook on my desk. I tried to look up something Friday but was no able to find it because I forgot the name.

    Basically, it's a way to analyze a queue. I remember in college this was tested by telling you that there were 4 ticket counters. Each counter could process X number of people over a given time. Then you figure out how many ticket counters are needed for a given amount of people.

    The teacher said they use this for stuff like scheduling workers for checkout lines at grocery stores to handle peak hours and such.

    If anyone could just tell me the name of what can be used for this, I'll look it up in my textbook Monday.
     


  3. amathew

    amathew Senior member

    Messages:
    1,532
    Likes Received:
    226
    Joined:
    Nov 4, 2011
    Location:
    KS => CO => MN => CA
    Simulation

    In the context of checkout lines, let's say that you wanted to know how long it should take you to get 'served.' If you took the number of available tellers and average session length for an average teller, you can come up with an average wait time per person. In reality, the number of available tellers is probably small, so simulation can be used to run the calculations numerous times and grab the average session length from each iteration, say 1000 times.
     
    Last edited: Mar 8, 2014


  4. gettoasty

    gettoasty Senior member

    Messages:
    12,637
    Likes Received:
    6,769
    Joined:
    Feb 8, 2010
    Location:
    Home
    I wish I used my statistics degree more post-college... seems like a waste taking all those classes and getting the 2nd degree.

    What's the job prospects like nowadays for a statistics major? Is a MS/PhD still required? I worked in the statistics department for a bit and most grad students were PhD candidates, but I only witnessed two people leave and work in the industry. Seems like majority all want to get into the teaching track/associate professor positions.
     


  5. amathew

    amathew Senior member

    Messages:
    1,532
    Likes Received:
    226
    Joined:
    Nov 4, 2011
    Location:
    KS => CO => MN => CA
    1. It depends on the job. The more emphasis that the position places on advanced regression analysis, classification models, or machine learning, a MS/PhD is going to be necessary. So unfortunately the positions where people are working on fun problems do require more education. Of course, if you just want to be an analyst at a random company doing hypothesis testing and linear regression, those jobs are certainly there but there's also plenty of competition from non-stats majors in everything from economics to physics to social sciences.

    2. Job prospects are pretty good. With that said, the 'big thing' right now is big data so what companies want is both the statistical knowledge along with expertise in programming with big data technologies like Hadoop, NoSQL, etc. Most of the data scientists I've met that work on big data have had backgrounds in computer engineering or physics, so statisticians aren't necessarily benefiting from the increased demand for statistical knowledge.

    3. Demand for statisticians varies by industry. A lot of people end working in finance or at tech companies. However, you're seeing more people go to marketing companies and ad agencies as they want to make smarter data-driven decisions. I started my career at a tech company and now I'm at a digital ad agency; best move ever. So many incredibly challenging problems (attribution modeling, click path models, etc) though there are issues as well, namely poor data warehousing methods.

    4. Statistical software is an important variable in regards to job market prospects. Given that R and Python have emerged as the most 'in demand' tools in most industries (minus big pharma which still uses SAS) over the past decades, people with those skills are wanted. When I send out my resume, the call backs usually involve some mention of my experience with classification models and my knowledge of R, MySql, and Python. Knowing those three are important in todays job market. A big reason for why I was hired at my current position was my knowledge of R so don't overlook the fact that technology and statistical software are very important. Of course, ten years from now, R may be replaced with Julia and Python could be replaced by Clojure as the 'in demand' technologies to know.
     
    Last edited: Mar 8, 2014


  6. otc

    otc Senior member

    Messages:
    14,990
    Likes Received:
    4,589
    Joined:
    Aug 15, 2008
    FYI, the coursera Data Mining course is starting up again:
    https://www.coursera.org/course/ml

    It says it started on March 3rd, but that's not really true...the first videos have been up since the but the real first week technically starts today (with the first review quiz due on sunday). Also, first week is pretty basic and if you bothered to click on this thread, it is probably just review and definition of terminology.
     


  7. amathew

    amathew Senior member

    Messages:
    1,532
    Likes Received:
    226
    Joined:
    Nov 4, 2011
    Location:
    KS => CO => MN => CA
    There's also an 'Exploratory Data Analysis' being offered by Udacity. It's good for beginners and those looking to learn R.

    https://www.udacity.com/course/ud651



    I just hope the stats and data science job market doesn't get flooded with bunch of people who took part in a few webinars and now think they're experts in statistical modeling, bayesian stats, etc. There are already too many of those hacks.
     
    Last edited: Mar 24, 2014


  8. capnMURPHY2021

    capnMURPHY2021 Senior member

    Messages:
    189
    Likes Received:
    61
    Joined:
    Jan 27, 2014
    The vast majority of those webinar graduates could not interpret a regression output to save their lives, so you need not worry.

    Speaking of webinars, the Stanford class on convex optimization was very interesting and useful. Great lectures, too.
     


  9. clee1982

    clee1982 Senior member

    Messages:
    8,720
    Likes Received:
    516
    Joined:
    Feb 22, 2009
    Location:
    New York City, NY, USA
    I am sure a lot people are jumping in from communication and signal processing...
     


  10. mrscrouge

    mrscrouge Active Member

    Messages:
    25
    Likes Received:
    0
    Joined:
    Jan 29, 2011
    I graduated last summer with a BBA and decided to take another 4 classes in order to get a B.I certificate. Got an introduction to sql, database structures, a little bit of data mining , and also worked with tableau. I just got a job in the B.I department of a large corporation and i feel like i dont know anything! Im looking to find some helpful MOOCs on relevant statistics and intro to R as it seems to be a requirement .
     


  11. amathew

    amathew Senior member

    Messages:
    1,532
    Likes Received:
    226
    Joined:
    Nov 4, 2011
    Location:
    KS => CO => MN => CA
    - Every professional job I've had (only two), the initial month has involved me feeling like I don't known anything. So is it really just 'new job jitters' or do you really
    feel that there are deficiencies in your understanding of how to examine and analyze data.

    - A BI department that requires knowledge of R, that seems odd. BI is much more about reporting and so BI tools like Tableau should be more important. R has a set
    great visualization tools, but I'd choose Tableau over it if the purpose is presenting pretty graphics to business people.

    - For a basic intro to statistics, try the following book:
    Statistics in Plain English

    - To learn about hypothesis testing and regression analysis, try the following book:
    Data Analysis Using Regression and Multilevel Modeling by Gelman

    - To learn the basics of R, start with the Intro to R pdf made available by CRAN.
    http://cran.r-project.org/doc/manuals/R-intro.pdf

    - After reading the R manual, read the following.
    Using R for Introductory Statics by Verzani
    R Cookbook

    - Also, start looking at R questions on Stack Overflow and R bloggers
    http://stackoverflow.com/questions/tagged/r
    http://www.r-bloggers.com/

    If you're interested, I might even be able to do some one on one tutoring on R. I've done that in the past and can put together some introductory code and tricks/tasks that I commonly use. I also have a meeting at work on Tuesday where I'll be doing an informal presentation to our interns on how I use R at work. Could also share some of that information with you. I'd have a small fee but if you needed something like that, let me know.
     
    Last edited: May 18, 2014


  12. amathew

    amathew Senior member

    Messages:
    1,532
    Likes Received:
    226
    Joined:
    Nov 4, 2011
    Location:
    KS => CO => MN => CA
    My job has turned into doing a lot of sentiment analysis of social media postings. It's a lot of cluster analysis (k-means) and information retrieval. At the end of
    the day, I feel that sentiment analysis of social media data is bullshit, or at least unnecessarily hyped, so I'm not sold on any of it.
     
    Last edited: May 18, 2014


  13. Reggs

    Reggs Senior member

    Messages:
    5,571
    Likes Received:
    387
    Joined:
    Mar 11, 2006
    Location:
    The Internet
    At a new company, need to work with data, all the customer information is all out of order. I have 1K+ "active" customers, 377 have all their cells filled in, and with those 377 customers it's all over the place. The customers in England are listed as UK, the UK, United Kingdom, the united kingdom. I can get that sorted out, but that's such a small % of the list.

    Most entries have an address field that has all the information dumped into it, but nothing in the city or country cell. It's just a mess. I need it to be in order, but I don't have time to deal with it. Anyone know any services who take care of stuff like this?
     


  14. otc

    otc Senior member

    Messages:
    14,990
    Likes Received:
    4,589
    Joined:
    Aug 15, 2008
    I'd try dumping it through a geocoder.

    With a limited number of addresses, the free API to google should work:

    https://developers.google.com/maps/documentation/geocoding/

    You don't actually care about the latitude and longitude, but it has the nice side effect of returning an address broken up into its component parts.

    Just concatenate everything into one string like 123 Street St, City, Country, Zip, Whatever and google should be able to figure out out.
    I'd probably use python with the requests and json libraries...but there are about a billion ways to do this (and it could be done straight from R or SAS too)
     
    Last edited: May 20, 2014


  15. clee1982

    clee1982 Senior member

    Messages:
    8,720
    Likes Received:
    516
    Joined:
    Feb 22, 2009
    Location:
    New York City, NY, USA
    What kind database do you guys use if I want to interact with R or pyhton? I am thinking of some personal project for fun, database doesn't have to be relational, what would be the most intuitive way to load data parse and do computation on the fly, query speed just have to be ok, I value flexibility and ability to calculate and manipulate data on the fly more so than anything else. Oh and data is not necessarily static in the sense that I upload once and be done with it, but update to data with be relatively infrequent.
     


Share This Page

Styleforum is proudly sponsored by