CS 349-02:
Machine Learning

Spring 2017, Wellesley College

PS0: Math Self-Assessment and Python

Out: Thu, Jan 25th      Due: Mon, Jan 30th at 12:30 pm EST
Repository: https://github.com/wellesleynlp/ps0

This is an ungraded but required assignment with two components to help you get set up and running for the course. You must submit the PS0 form once you complete both parts of this assignment.

0. Getting Acquainted

  1. If you haven't already, fill out the pre-class survey.
  2. Read through the course website, particularly the FAQ, the syllabus, and the getting-help pages.
  3. If we haven't interacted much before -- or optionally, even if we have -- come introduce yourself (individually or in small groups) during my office hours in the next week or two. If you prefer to avoid waiting, sign up for a 15-min slot.
  4. Make a post on Piazza asking or answering a question about the first couple of lectures, readings, course logistics, or the Python/Git tutorial below. If you don't have a question, feel free to start a discussion, or share a link relevant to ML. The last three items have no specific due date -- just try to finish them in the next week or two.

    1. Math Self-Assessment

    A set of exercises for you to test yourself on some basic math we'll use in this class. Set aside an hour or so. After you're done (and only after), check your answers.

    2. Python and Git Installation and Tutorial

    Install Git and Python and some required libaries on your computers by following the instructions below. If you have any issues, please post on Piazza -- chances are that other students have encountered similar problems as well. Working in groups during the installation may also help.

    Do your best to install these programs successfully, but don't stress out if you're running up against a wall. Come to our drop-in hours.

    2.1 Python and Libraries

    We're using Python (specifically, Python2) in this course because

    1. It facilitates rapid prototyping of programs compared to, say, C and Java, while still being a robust, widely-used language
    2. There are several libraries and functionalities well-suited to machine learning, numerical computation, and data processing.

    Installation on Macs

    Python is already installed on Macs. Type python in Terminal to check -- this opens an interactive session. You can write Python programs using a text editor and execute them on the command line by navigating to the directory containing your code (let's say it's called filename.py) and running
    python filename.py
    
    If you prefer an IDE, Canopy (used in CS111) is a safe -- if somewhat bloated -- option.

    Installation on Windows

    The simplest option is to use an IDE such as Canopy, unless you're familiar with Python IDLE or using the Windows command prompt.

    Required Python Libraries

    Python libraries are handled differently depending on whether you're using Canopy or a text editor with the command-line.

    Canopy users: go to Tools > Package Manager. Do a search for "numpy", which will bring up a package with that name and indicate whether it is installed or not (it usually is). Similarly, search for "scipy", "matplotlib" and "scikit_learn". The last one may not be installed, so search for it under the "Available" packages and click the button to install it.

    Non-Canopy users must install pip, a package manager, following the instructions in the link. Then, at the command line, type the following sequence of commands.

    pip install --upgrade numpy
    pip install --upgrade scipy
    pip install --upgrade matplotlib
    pip install --upgrade scikit-learn
    

    iPython Notebooks

    If you took CS111 in Python, you're already familiar with Notebooks, which allow code and commentary to be interleaved. You needn't do anything more if you're using Canopy. Non-Canopy users must install jupyter with pip, like so:

    pip install --upgrade jupyter
    

    2.2 Git and Github

    You should have signed up for a GitHub account by the time you submitted the survey. Git is a version control program (similar to Mercurial that you may have used in CS240), and GitHub is a website that offers hosting of Git repositories, with additional bells and whistles such as access permissions, the ability to fork other people's projects, wikis, issue trackers, statistics, etc.

    We're using version control for assignment submissions in this class because:

    1. It makes partner and team work easier.
    2. It simplifies the workflow of getting starter code, turning in your submissions, and receiving feedback.
    3. It gets you into the habit of using version control, which is good practice and a necessary skill for a computer scientist.

    Installation on Macs

    If you have installed Developer Tools or XCode in the past, chances are that Git is already installed. Type git in Terminal to check.

    Otherwise, download and install it from this link.

    Installation on Windows

    Download and install Git from this link. If you prefer interacting with Git on the command line, Git BASH is a standalone command line tool.

    Optional: GUIs for Git

    There are several GUIs available for Git if you're not a fan of the command line, such as GitHub Desktop.

    Git Commands to Know

    Avoid using features like branches and pull requests for the problem sets unless you're already comfortable with them. The key features you must learn are cloning, pulling changes, adding and committing changes, and finally, pushing your local commits to the server.

    2.3 Python and NumPy Tutorial

    Clone the linked repository to your computer and follow the instructions in the README to check your libraries and run the tutorial. If you're not able to open or run the tutorial notebook due to some unsuccessful installation, read through it here.