Data Mining

A little about me

  • My background is largely in machine learning as applied to physical science

  • The reason I’m here is to help introduce you to the skills and resources you’ll need to interact in the modern world of data mining

  • The first time I cut code for money was in 2014 and was for embedded data collection/analysis.

  • My area of interest is in the intersection of natural science and computer/data science and how to use principles from one domain to empower the other.

A little about you

Right Now:

  • What is your: (name, major)?
  • Why are you taking this class?
  • What do you expect from this class?
  • What do you intend to do with the information you learn in this class?

Later (turn in exit ticket)

And now for the syllabus…

Get into groups for the following questions…

What is Data Mining

Which of these things are part of Data Mining? make a case for why or why not.

  • Algorithm Design

  • Statistics

  • Engineering

  • Optimization

  • Computer Science

  • Domain specific (expert) knowledge

With your group

  • Enumerate three fields for which you believe data mining is essential, and one which you believe data mining is not essential.

Introducing Data Mining

  • Data mining is the act of extracting actionable information from a mass of data.

  • It indeed includes aspects of algorithm design, statistics, engineering, optimization, and computer science.

  • In addition, it required expertise in the domain you’re working in (much like the best data science)

Data mining can be summarized into the following steps

  1. Create a dataset which describes some aspect of the real world. Our datasets have two parts:
  • Samples: “observations”, real life things, “rows”
  • Features: Descriptions of our samples, “columns”
  1. Tune the algorithm. Our data mining methods have parameters which influence its efficacy. For example:
Person Height Short or Tall
1 5’6” Short
2 6’0” Tall
3 5’2” Short
4 6’6” Tall

What’s the parameter we care about?

Environment

  • Python

  • Jupyterlab

  • scikit-learn

Reading

Go read up to Page 13 of the text (In chapter 1)