CS 349: Natural Language Processing

Spring 2016, Wellesley College

Final Project Guidelines

The final project is your chance to implement that cool app you've always dreamt of, or dive into NLP research. Your idea can spin off an assignment or topic from class -- or go an entirely different direction, as long as it is primarily related to NLP. Draw from your own interests and hobbies.

Ideally, you will end up a program that demonstrates your mad NLP skillz to potential employers, or a research paper that will make a graduate admissions committee sit up.

While the quizzes and assignments help you master the machinery we learn in class, the project is about applying this knowledge independently -- which is a far more important learning goal in the long run.

See the Links page for resources.

Logistics

The project should be done in teams of 2-3. Talk to me if you'd prefer to work on your own.

There is no restriction on which programming language you code in, or what datasets and external tools you use. However, you do have to submit all of your code and supporting data with your final submission.

You may run into difficulties once you start working, in which case it's totally fine to change the plans you set out in the proposal. Just try to figure this out soon, much before the project update is due, so you're not rushing at the end.

Evaluation

The project is worth 20% of the course grade: about the same as the quizzes put together, or 2.5 assignments. I recommend you spread the work out over the time span of the project, doing a little chunk each week.

Proposal + Pitch 15% March 03
Project Update 5% April 04
Peer Review 5% May 02
Final Paper/Product 75% May 12

Project outcomes will be evaluated on (1) amount of demonstrated effort (2) depth of research into related technologies (3) creative problem-solving (4) understanding of the underlying NLP/machine learning.

You will not be penalized if your results fall short of expectations -- after all, that's just the uncertainty of research and development. In other words, as long as you write correctly-working programs that implement sensible models, and can show that you made an effort to try multiple ideas, it's okay if your accuracies are not high.

However, if things don't work simply because you have bugs in your code, show a conceptual misunderstanding of the algorithms you're implementing (especially if they're models from class), or didn't spend the time to try multiple approaches, it will affect your grade. For this reason, it's a good idea to informally check in with me to keep on track.

Project Proposal

Due by 11:59 pm Thursday, March 03 into this Google Drive directory. Presentation in class on Thursday; slides due in the same Google Drive folder by noon.

The project proposal sets out your topic and goals. The proposal outline and pitch together are worth 15% of your project grade. Grades are based on clarity, organization, depth of research into prior work, and writing and presentation style.

By 11:59 pm on March 03, upload a 2 page document to this Google Drive folder, named student1name_student2name_student3name_proposal, either as a Google Doc or a pdf.

You will also give a 2-3 minute lightning talk or "pitch" describing your idea. Submit your presentation slides as a Google Slides document into the same folder by noon on March 3rd.

Topic

The most important criterion for choosing a topic is that it genuinely excites you. Don’t be afraid to get creative. The second is feasibility -- you have just over two months, so set your goals realistically. It goes without saying that your topic should also be (1) directly related to natural language processing, though not necessarily something we’ve done in class, and (2) awesome.

Projects generally involve research or software development.

  • Research: Identify a task involving language, and develop an algorithm to solve the problem. The task by itself could be novel, or you could explore new ways of attacking it compared to existing work. The emphasis of a research project is generally on coming with a solution and measuring its performance. The solution can be entirely original, or combine techniques from previous approaches, or involve some non-trivial tweaks to an existing approach that improves it. You can also choose to survey and compare current methods on a common dataset.

    You must read at least a few academic papers that tackle similar problems, and compare your results to previous work. Your final deliverable will be an 8-page paper summarizing the literature, your methods, and experimental results, and supporting data and code.
  • Software: Build a usable program for a task. Unlike research projects, the emphasis is on the final product, so you may implement or adapt existing algorithms, but as far as possible, your topic should have an original twist. It's no fun creating a program that already exists!

    Topics can run the gamut from games and apps related to language, websites providing some useful functionality working with APIs and data, programs that make a research result accessible to the public by implementing it in a user-friendly way, pedagogical tools, or libraries for programmers with NLP-related algorithms. (You can even choose to contribute to an existing open source project like NLTK.)

    The final deliverable will be a working program, as well as a 4-page writeup describing related work and the algorithms you used, and a summary of how well the program meets your goals.
These categories are not mutually exclusive -- many projects will end up being some mix of the two. I highly encourage you to discuss your ideas with me before you submit the proposal. We can brainstorm together to narrow down a topic, and find references and data.

Proposal Outline

Your proposal document should be about two pages, and include

  1. A description of your problem and motivations.
  2. A brief survey of existing work.
  3. A description of your proposed solution(s), including the data and tools you will be using.
  4. Description of work you have already completed. This section can be short but not empty. Plan to start reading the background literature and brainstorming before you submit the proposal.
  5. Responsibilities of each team member. These may evolve over the course of your project, but it helps to have a plan
  6. Three milestones. Be realistic about the milestones, and yet ambitious.
    • What to complete by April 4th when the project update is due
    • The minimum desired outcome of your project by the final submission on May 12th
    • The ideal final outcome of your project

Project Pitches

Prepare an entertaining and informative presentation for March 03, consisting of a 2-3 minute talk, plus a couple of minutes for audience feedback. Imagine you’re pitching your idea to an audience of venture capitalists or scientific funding agencies (or if you prefer a less monetary viewpoint, an academic thesis committee). Describe your proposed question, and convince us that it’s the most interesting/urgent/necessary problem in the world, and that you have the ultimate solution.

You need not use slides, but if you do, please submit your slide deck as a Google Slides document to the shared folder by noon on the day of the presentation.

Repository

Please create a repository at github.com/wellesleynlp/student1name_student2name_student3name-finalproject giving Write access to all members of your team and me. You will use this for code and ongoing work related to the project, as well as the final submission. (As usual, avoid pushing large static datasets.) The repo can be empty by the time of the proposal. You may make it private or public.

Project Update

Due by 11:59 pm Thursday, April 07 to your repository.

Add a section to README.md in your repo, titled "Project Update" or something of the sort. Describe your progress, touching upon these points.

  • Did you meet your first milestone? Did you change your milestone?
  • What have you finished so far? Include background reading, development, brainstorming, and results.
  • You should aim to have *some* results by the update. Describe them.
  • Are the results satisfactory? If they aren't, what do you plan to modify/add?
  • Updates to your project plan, like goals and techniques that have changed since your proposal.
  • Difficulties or questions that I can help you with. If you're having code-related problems, note the file name and line numbers.

Peer Review

Due by 11:59 pm Friday, May 06 to this form.
Pair up with another team. Each team should describe/demo their results so far, and write a critique of the other team's work. Find your pairs using this spreadsheet.

Final Paper/Product

Due by 11:59 pm Saturday, May 14 to your repository.

Your final submission will take one of two forms: a usable software program, or a research paper.

Usable Program

This is any project whose primary outcome is the development of a working program with a coherent goal, rather than an exploration of data. Most of your projects fall into this category.

Your code must run without errors. Either write clear instructions in your README showing how to run your program, or provide a link to a public web implementation, if you have one.

Include all supporting data files. Git LFS is a nice way to manage large files. You can also put the files on Google Drive, tempest, or the web, and include the link.

In addition, push a ~1200 word writeup in PDF or HTML or Markdown format to your repository, or write it as a Google Doc/public webpage and give the link in your README. This paper should include:

  1. A description of the motivating problem.
  2. A brief survey of related research, citations to the algorithms you implement, and a description of similar existing products.
  3. An explanation of your approach.
  4. Analysis of any shortcomings of your program, and ideas for future development.

Research Paper

These are projects whose objective is more exploratory than developmental, like analyzing data with different NLP techniques.

Include all your code, but it's fine if your programs are scattered across files (unlike software projects which should have a coherent structure and interface). Describe briefly what each file does in your README.

Push a ~2000 word writeup in PDF or HTML format to your repository, or write it as a Google Doc/public webpage and give the link in your README. This writeup should include (in no particular order):

  1. A description of your data, problem and motivations.
  2. An overview of existing work.
  3. A description of your algorithms, and why you chose them.
  4. The results of your experiments, including numbers, visulalizations, and interpretations as appropriate.
  5. Analysis of any shortcomings of your work, and ideas for future research.

The order of topics and word-counts are just guidelines in both cases.