Team Project Guidelines
1.0 Introduction
The team project component of SIADS 593 Milestone I is designed to encourage you to both apply and extend the skills, tools, techniques, and knowledge that you’ve gained in your prerequisite MADS courses to a new problem.
Overall, the goal of this project is to provide you with an opportunity to bring your unique interests, creativity and ingenuity into alignment with one or two team members and demonstrate your team’s ability to apply data science skills and concepts on public datasets of your choice.
In particular, the project should show off your team’s ability to take a minimum of two data sets possessing different features and/or access methods and clean and manipulate them in order to extract a useful byproduct: something more than you could have gotten from either data set by itself. The cleaning and manipulation could involve filtering, format conversion, handling missing or noisy data, matching records from one data source with corresponding records in the other, and so on, and must make significant use of the programming tools and techniques covered in prior MADS courses.
❗ You are not expected to use advanced tools and techniques from courses that are not prerequisites of Milestone I (e.g., machine learning, causal inference, etc.), and in fact you are discouraged from doing so as neither your peers nor your instructors may be in a position to offer helpful advice.
There are several steps involved in the project:
- Problem formulation
- Team formation
- Selection of data sources
- Project proposal creation
- Faculty mentor assigned (by the teaching team)
- Peer feedback on project proposals
- Final project report + code
- Peer feedback on project reports
2.0 Problem formulation and team formation
Team formation and the selection of project topics and datasets are student responsibilities. We suggest that you advertise your interests via Slack and coordinate via the Project proposal and team tracker Google Sheet (see Coursera week 01 readings for the link to the document).
We expect that all teams will consist of two to three (2-3) students. The maximum team size is three (3) students. Working alone or working in groups of more than three is not permitted. It has been our experience that working in pairs or groups of three balances high-quality work with the amount of work each person needs to do to complete the project. Each of the steps is described below.
Before starting work on the actual project, you’ll write a short project proposal that is meant to be a high-level summary and does not need to contain technical details or code.
Once you have shared your project proposal, a faculty member from the instructional team will be assigned to your team as your primary mentor. This is the person who will help guide you through the project and you should expect to be in weekly contact with them.
You will then review your peers’ proposals, and they will review yours. You can then fine-tune your project based on the feedback you receive. Requiring the proposal to be submitted early is intended to get you thinking about questions and datasets you’re interested in, and how those might be answered with the tools you’ve been exposed to in previous classes.
3.0 Proposal (30 points)
You must use a copy of the project proposal template to create your project proposal (see Coursera week 01 readings for the link to the document, which you must copy).
Your proposal will be reviewed by a minimum of two peers from the class and your team is encouraged to take their feedback into consideration when you work on your project.
❗ When your project proposal is ready for review please provide a link to it via the project tracker and coordination spreadsheet. Be sure to enable commenting on your Google Doc in order to permit peer reviews.
Rubric: Proposal
Great (27-30 pts) | Good (24-26 pts) | Not Good (0-23 pts) |
---|---|---|
Clear, concise proposal; all sections of form completed. | Most sections complete; some sections too brief or not obviously relevant. | Notable errors or omissions in the proposal. |
Evidence of having thought deeply about the nature of manipulation, analysis, visualization, and ethical considerations. | Evidence of having considered in a general way the nature of manipulation, analysis, visualization, and ethical considerations. | No evidence of having thought about the nature of manipulation, analysis, visualization, and ethical considerations. |
4.0 Proposal peer review (30 points)
You must sign up and review a team project proposal. Note that your proposal grade will not be based on the content of your peers’ reviews.
Peer reviews will take the form of comments on the Google Doc containing the proposal and will be coordinated via the Google sheet.
The purpose of the peer review is threefold:
- gain experience reviewing a proposal,
- learn about other work going on in class,
- get feedback on how to make your project better.
Your review comments taken together should span several paragraphs and exhibit the following points:
- professional: what would a co-worker think about your review?
- pleasant: courtesy goes a long way
- helpful: what sort of advice would you want?
- scientific: focus on facts, not opinions
- realistic: keep scope in mind
- empathetic: how would you feel if you received the review you wrote?
- organized: make it easy for the recipient to follow your train of thought
💡 A useful approach when writing a peer review is the “two stars and a dog” approach. In other words, highlight two things that the authors did well and identify one area where they might spend some time improving their work (and make constructive suggestions about how to do so).
Rubric: Proposal peer review
Great (27-30 pts) | Good (24-26 pts) | Not Good (0-23 pts) |
---|---|---|
Review completed. | Review completed. | Review not completed. |
Review provides constructive feedback and actionable advice. | Review is helpful, but lacks constructive feedback and/or actionable items. | Review, if completed, is not particularly helpful to authors. |
Review meets all criteria listed above (professional, helpful, pleasant, scientific, realistic, empathetic and organized). | Review meets nearly all review criteria. | Review does not meet review criteria. |
5.0 Team collaboration (30 points)
Collaboration is an important skill to develop as data scientists. We will look for evidence of sustained collaboration that leads to a final project that is more than any one individual could have produced on their own.
Rubric: Team collaboration
Great (27-30 pts) | Good (24-26 pts) | Not Good (0-23 pts) |
---|---|---|
Clear evidence, drawn from the content of the progress meetings and check-ins, as well as the Statement of Work, of sustained collaboration among team members. | Some attempt at cooperation, typically a result of applying a “divide-and-conquer” strategy to accomplish the work. Little evidence of sustained collaboration. | Little or no evidence of the ability to work with others in a team. |
6.0 Status updates (30 points)
It has been our experience that keeping in regular contact with your faculty mentor is the best way to ensure success with your project, so we encourage you to periodically check in with them. Ideally, these check-ins will take the form of meetings held at least once every two weeks. Alternatively, you can check in using chat on Slack, but at least one video-based synchronous meeting (e.g., Zoom, Google Hangouts) is required.
Rubric: Status updates
Great (27-30 pts) | Good (24-26 pts) | Not Good (0-23 pts) |
---|---|---|
A minimum of three video-based synchronous meetings plus one or more Slack-based check-ins completed. | One video-based synchronous meeting, plus two or more Slack-based check-ins completed. | Fewer than one video-based synchronous meeting and/or fewer than two Slack-based check-ins. |
Students well-prepared for meeting, showing progress to date and identifying challenges for next steps. | Students summarize progress, but no mention of next steps. | Team members generally unprepared for meeting. |
7.0 Project report (450 points)
First and foremost, following the individual original work policy stated at the start of the course, the topic and questions you ask in your project must be of your own invention. If you used ideas from a particular web site or previous project, or did your project as part of an existing research collaboration, you must identify your sources and/or collaborators and provide links and citation(s) where appropriate.
7.1 Report length
The report must be no more than 11 pages in length (inclusive of code snippets, visualizations, references, etc.). The main body of the report, including title, must not exceed 10 pages in length (page 11 is limited to the statement of work and endnotes).
The final report must be delivered as a PDF. Font size must be at least 10 point and margins must be at least 0.5" all around. Avoid the use of a cover page. Instead, include the title and authorship at the top of the first page. The page limit will be strictly enforced: we will only consider the first 11 pages when assessing your report.
7.2 Report format
Pages 01-10: main body of report (including title and authors)
Page 11: statement of work, endnotes (if any)
The format of the report is semi-flexible — you can include additional information (keeping in mind the page limit), but at a minimum the report must include the following sections. Please use section headings in your report that correspond to the following sections, other than “Spelling, grammar and style”.
7.2.1 Motivation (15 points)
Briefly state the nature of your project and why you chose it. What specific question or goal did you try to address?
Rubric: Project motivation
Great (14-15 pts) | Good (12-13 pts) | Not Good (0-11 pts) |
---|---|---|
Adequate context is provided to the reader so they understand why the work was undertaken. | Some context, but not altogether clear why the problem was chosen. | Little to no context provided. |
Multiple references to similar studies or reports. | One or two reference to previous or similar studies or reports. | No reference to previous or similar work. |
Compelling statement of proposed work. | Reader is left uninspired by the report narrative. | Reader is presented with questions to be answered but without any explanation of why they’re being asked in the first place. |
7.2.2 Data sources (30 points)
Describe the properties of the two dataset(s) or API services you used. Be specific. Your information at a minimum should include but not be limited to:
- where the datasets or API resources are located,
- what formats they returned/used,
- what were the important variables contained in them,
- how many records you used or retrieved (if using an API), and
- what time periods they covered (if there is a time element)
For example, if you downloaded data or used API services, you should state the specific URLs to those files or resources. It should require zero effort on the part of the teaching team to find and access the exact resources you used if we need to do so.
Rubric: Project report data sources
Great (27-30 pts) | Good (24-26 pts) | Not Good (0-23 pts) |
---|---|---|
Descriptions are sufficient for the reader to understand the magnitude and nature of the data. | Data sources are mentioned but access methods not specified; magnitude/size of data is vague. | Unclear description of how data was obtained. |
Variables of interest are adequately described, access methods detailed in a way that would allow the reader to access the data. | Variables are mentioned but not described. | Missing descriptions of variables deemed to be important for the project. |
7.2.3 Data manipulation methods (100 points)
For each of your two sources, describe how you manipulated the data. Briefly describe the workflow of your source code and what the main parts do. Other questions that you may want to address might include the following. Note that not all questions are applicable to all projects, nor is this an exhaustive list of topics to include in this section.
- How specifically did you need to manipulate the data?
- How did you handle missing, incomplete, or incorrect data?
- How did you perform conversion or processing steps?
- What variables and steps did you use to join the two data resources to perform your data analysis?
- What challenges did you encounter and how did you solve them?
Rubric: Project report data manipulation methods
Great (90-100 pts) | Good (80-89 pts) | Not Good (0-79 pts) |
---|---|---|
Depth, coherence and clarity of explanation of methods used. | Good descriptions, but some parts unclear or confusing for the reader. | Poor explanation of what was done to the data. |
Appropriate methods used for the nature of the work in the project. | Methods chosen for some tasks are not entirely appropriate. | Inappropriate techniques used in the manipulation of data. Bias introduced (e.g., not treating missing data properly, etc.). |
Sufficient guidance provided that allows the reader to re-create the analyses. | Sufficient guidance provided that allows the reader to re-create the analyses. | Insufficient guidance provided that allows the reader to re-create the analyses. |
7.2.4 Analysis (60 points)
A key goal of this project is bringing together two different data resources to answer an interesting question or find a new insight that could not have been answered with either data resource alone (which you summarized in the previous section). Now describe the analysis steps you performed on your combined dataset to address that goal/question. Be specific, and include references to key functions or parts of your code.
- What interesting relationships or insights did you get from your analysis?
- What didn’t work, and why?
Rubric: Project report analysis
Great (55-60 pts) | Good (48-54 pts) | Not Good (0-47 pts) |
---|---|---|
Analyses were appropriate for the data. | Analyses were appropriate, although alternatives exist that should have been investigated. | Analyses conducted were inappropriate for the data or the questions posed about the data. |
Description of the analysis is clear and in sufficient detail so as to allow the reader to understand what was done. | Descriptions mostly clear, but some confusing and/or unclear parts. | Descriptions unclear or perfunctorily done. |
The conclusions are consistent with what the analyses showed. | Conclusions mostly consistent with results of analyses. | Conclusions missing or not consistent with results of analyses. |
7.2.5 Visualizations (50 points)
Rubric: Project report visualizations
Great (45-50 pts) | Good (40-44 pts) | Not Good (0-39 pts) |
---|---|---|
Visualizations are appropriate, effective and expressive. | Visualizations are appropriate, although some more effective or expressive choices could have been made. | Visualizations are not appropriate for the nature of the analysis in the report. |
Visualizations are complete, including appropriate title, axis labels, etc. Visualizations are annotated appropriately (note: not all visualizations need annotations). | Some visualizations are incomplete (e.g., missing titles, axis labels, etc.). Some visualizations not referenced in the text. | Visualizations are incomplete or missing. |
All visualizations referenced and explained in the text. | Some visualizations not referenced in the text. | Visualizations (or their components) are not legible. |
7.2.6 Statement of work (15 points)
Page 11 comprises a statement that describes the contribution that each team member made to the project. You should explain how you collaborated or cooperated with each other. Page 11 can also be used for the inclusion of endnotes, if any.
Rubric: Project report statement of work
Great (14-15 pts) | Good (12-13 pts) | Not Good (0-11 pts) |
---|---|---|
Clear, concise statement of who did what. Assessment of how collaboration went, including statement about how to improve collaboration in future work. | Somewhat unclear statement about which parts of the project were worked on by which team members. Statement about how collaboration went without mention about how to improve future work. | No clear statement about who worked on what or how collaboration went. |
7.2.7 Spelling, grammar, and style (30 points)
The best papers, and in some cases even some of the best ideas, can be undone by inattention to spelling, grammar, and style. Please check your report for spelling and grammar (most word processing and presentation software has built-in spelling and grammar checkers). We strongly suggest that you proofread your work with “fresh eyes” – that is, take a break from your writing, step away, and then return to review your work. For matters of style, we strongly recommend following the advice of William Strunk and E.B. White (the essayist and author of such children’s favorites as Stuart Little and Charlotte’s Web) in their classic work The Elements of Style.
Rubric: Project report spelling, grammar, and style
Great (14-15 pts) | Good (12-13 pts) | Not Good (0-11 pts) |
---|---|---|
No spelling or grammatical mistakes. | Minor spelling and/or grammatical errors. | Errors in spelling, grammar or style detract from the overall report. |
Report is well-structured, logically flows from one section to another. | Some problems with logical structure that detract from the report. | Report is structured poorly or not at all. |
Length is appropriate. | Length is appropriate. | Report is either too long or too short. |
Report is submitted in correct format (see description is Section 7.3 below). | Report is submitted in correct format. | Report is not submitted in correct format. |
7.3 Code (150 points)
Please submit “polished” Jupyter *.ipynb
notebooks and supporting Python *.py
modules (if any). The notebooks
must:
- Present a coherent, computational narrative of your data manipulation and analysis.
- Run without errors from top to bottom.
- Run without warnings from top to bottom.
- Include a mix of code and Markdown cells.
- Include comments in the code, whenever appropriate.
- Avoid generating excessive output (i.e., more than 10-15 lines per cell).
Deciding on the number of notebooks to produce is a team decision. You might choose to separate the data acquisition, cleaning and manipulation, and analysis tasks into multiple notebooks in order to narrow the code focus and control for length. A potential downside is code repetition across the notebooks, the likelihood of which increases for the inexperienced and those working under tight time constraints (adoption of code modularization strategies can mitigate this concern). Another option is a single notebook. Opting for this approach places a premium on notebook organization in order to ensure that the reader is able to sensibly traverse an otherwise sprawling set of cells. Regardless of the number of notebooks, the code must be organized in a manner that facilitates comprehension and reuse.
If you have yet to do so consider adopting the following best practices:
Topic | Source(s) |
---|---|
Python code styling | Kenneth Reitz, ed. PEP 8 Style Guide for Python Code. Adapted from the original Python Enhancement Proposal PEP 8. |
Python code refactoring | Nick Thapen. Python Refactoring, parts 1, 2, 3, 4, 5, 6, 7, and . Sourcery. 2020-2022. |
Jupyter notebook design | Adam Rule, et al. Ten simple rules for writing and sharing computational analyses in Jupyter Notebooks. PLOS Computational Biology. 25 July 2019. |
Data analytic styling | Jeff Leak. The Elements of Data Analytic Style. Leanpub. 2014-2015. |
DRY principle | Wikipedia. Don’t repeat yourself. |
Scientific writing | George Gopen and Judith Swan, The Science of Scientific Writing. American Scientist. Nov-Dec 1990 |
Rubric: Project report code
Great (135-150 pts) | Good (120-134 pts) | Not Good (0-119 pts) |
---|---|---|
EDA process including cleaning and manipulation steps is well-documented. | EDA process including cleaning and manipulation steps are largely documented but gaps exist. | EDA process including cleaning and manipulation steps is poorly documented. Code fails to cover one or more topics. |
Dependencies are recorded. | Dependencies are largely recorded but gaps exits. | Dependencies are not recorded. |
Notebook(s) is/are structured appropriately. | Notebook(s) is/are structured but not always appropriately. | Notebook(s) lacks/lack a logical structure. |
Cells are well-annotated including use of Markdown text cells to help guide the reader through the notebook. | Cell annotations are provided but not always consistently or clearly. | Cell annotations are weak or non-existent. |
Code is modularized. | Code modularization is good but could be improved. | Code is not modularized. |
Code readability is enhanced by adherence to PEP 8 styling guidelines. | Code readability occasionally strays from PEP 8 styling guidelines. | Code does not adhere to PEP 8 styling guidelines. |
Visualizations are expressive and effective; marks, labels and color palette are appropriate. | Visualizations are generally expressive and effective but in some cases choice of visualization type, marks, labels, and/ color palette is suboptimal. | Visualizations are absent or are insufficiently expressive and effective. Code fails to cover one or more topics. |
No warnings or errors are encountered when notebook(s) is/are run. | Notebook(s) is/are run successfully (no runtime errors) but Warnings are encountered. | Warnings and errors are encountered when notebook(s) is/are run. |
7.4 Report submission
❗ Please submit the team project ZIP archive on or before the due date. Failure to do so will trigger late penalties as described in the Syllabus.
Team project assets must be submitted in a *.zip
file. Structure the ZIP archive as follows:
Requirements
-
Zip archive filename: You must utilize you Coursera project team name as the name of your ZIP file formatted as follows:
Zip archive:
< team number >-< uniqname >-< uniqname >[-< uniqname >]_[YYYY][fall | winter | sprsum].zip
Example:
00-arwhyte-cteplovs-2023winter.zip
-
Team project report: Your project report must be named the same as your Coursera team name and must be provided as a PDF (e.g.,
00-arwhyte-cteplovs-2023winter.pdf
). Remember, the project report must be no greater than 11 pages in length. Keep this in mind if you are planning to generate the PDF from a Jupyter notebook. We strongly recommend that you use Google Docs or Slides, Microsoft Word or Powerpoint, or some other word processing package to generate your final PDF. -
Source code and data: Place all source code files (e.g., Jupyter notebooks, Python scripts, or other code) in a
src/
directory in your zip file. Place all data files in asrc/data/
directory in your zip file.❗ Do not submit large data files as part of your zip file but be sure that the teaching team and your classmates can access the data. If a data file is over
10 MB
or not available in file form, create a sample file containing the first100
records. If data was retrieved via an API document the URL endpoint(s) in the relevant Jupyter notebook(s) or Python script(s).💡 Choose names for your notebook(s), Python
*.py
script(s), and/or module(s) that convey to the reader each file’s purpose.ZIP archive structure
00-arwhyte-cteplovs-2023winter.zip src/ some_notebook.ipynb another_notebook.ipynb some_script.py some_module.py ... data/ some_dataset.csv another_dataset.json sample_data.csv ...
You will also be required to submit your team project report PDF to a gallery tool that will facilitate the sharing of your report with the class. The details of this will be announced during the course.
7.5 A word about faculty review of pre-submission versions of reports
In the past, some teams have requested a review of their final project before it is submitted. We cannot provide detailed feedback on reports, although we are happy to respond to specific questions about the project as part of the regular check-in process described in an earlier section of this document.
8.0 Project report peer feedback (30 points)
You will have an opportunity to provide feedback on a project report, ideally the report whose proposal you reviewed previously. We will likely use a gallery tool built by the Center for Academic Innovation (CAI). More information will be provided regarding the review mechanism later in the course. The rubric for this component is identical to the one for the project proposal peer feedback.
Rubric: Project report peer feedback
Great (27-30 pts) | Good (24-26 pts) | Not Good (0-23 pts) |
---|---|---|
Review completed. | Review completed. | Review not completed. |
Review provides constructive feedback and actionable advice. | Review is helpful, but lacks constructive feedback and/or actionable items. | Review, if completed, is not particularly helpful to authors. |
Review meets all criteria listed above (professional, helpful, pleasant, scientific, realistic, empathetic and organized). | Review meets nearly all review criteria. | Review does not meet review criteria. |
9.0 Technology choices
This course differs from other MADS courses in many ways including technology. We have created a Jupyter environment for you that is functionally equivalent to SIADS 516, which is a superset of the base MADS environment, and you can access that environment via the “ungraded lab assignment” in Coursera. You can use that environment or choose to use any of the environments from courses you have already completed to build and test data manipulations and visualizations for your project. Alternatively, you can use your own locally installed environment. Another possibility is to use Deepnote, Google Colaboratory, or VS Code’s Live Share which may facilitate team collaboration. As part of the grading the teaching team may attempt to reproduce your results using your code and data, and you are expected to assist with this if we request it.