Python Data Science Handbook [on GitHub]
Jake VanderPlas, O'Reilly Media, 2016Course website
Neural Networks and Deep Learning
Michael Nielsen
A Whirlwind Tour of Python [on GitHub pdf]
Jake VanderPlas, O'Reilly Media, 2016
CoCalc (Website for sending your homework.)
Kaggle (Website for finding datasets.)
A 9-hour Python tutorial focusing on data processing (A brief introduction of Python basics, Numpy, pandas, matplotlib.)
Jephian Lin
CommonMark (You may find Markdown tutorials here.)
Machine learning has been shown powerful on many specific problems. To make things work, one has to clean up the data and apply appropriate algorithms on it. In this course, we will learn how to process and analyze data with Python. We will introduce Python packages including Numpy (for scientific computing on arrays), pandas (for processing data), and scikit-learn (for machine learning). If time allowed, we will go through some examples on Keras (for neural network) and matplotlib (for data visualization). With these tools at hand, you will find it much easier to learn further and newer techniques on machine learning.
30% Homework + 20% Midterm + 30% Presentation + 20% Report + extra 5% participation
You get 0.5 participation point whenever you ask a question, correct a mistake, or share your thoughts in the class. The maximum participation point is 5 points for each person. Record your activities using the link above.
Weekly homework will be assigned through the CoCalc system. (So it is important that you have a CoCalc account.) There will be 10 homework assignments. Each assignment counts for 2 points, while you get another 1 point by grading other's homework.
For example, after HW1 are collected, I will randomly distribute them into your CoCalc projects; during the week that you are working on HW2, you have to finish grading another student's HW1. Similarly, within one week after I collected HW10, you have to finish grading anther student's HW10.
The presentation counts for 30 points. You may use either English or Chinese to present, but the explanations on your Jupyter notebook (or slides) have to be in English. Find something intereseting on Kaggle (or other resources), and share with us how do you process and analyze the data. The length of the presentation is 10 minutes for each student and there will be a 5-minute question time. All students are encouraged to ask questions; remember you can still earn the participation points (see the Evaluation section). All topics are welcome, while the assessment will be based on the following criteria.
Accessment:
The report counts for 20 points. It can be in English or Chinese, but you have to use the Jupyter notebook to write your report. Find a machine learning algorithm and understand the details, write an introduction to the algorithm, and implement it in Python without using existing implementations such as scikit-learn, Keras, and so on.
For examples, you may introduce polynomial regression, logistic regression, support vector machine, naive Bayes classification, or any machine/statistical learning algorithm you like. Remember that you may use Markdown to make elegant typesetting; see the tutorials on CommonMark. Note that the implementation (and the documentation) is an important part of your report.
You may submit your report through email. The deadline is 11:59 pm, Jan 5. I will reply to your email within one day; this makes sure that your report is delivered correctly. If you really need more time, please notify me before 11:59, Dec 29.
Accessment:
Students with diverse learning styles and needs are welcome in this course. In particular, if you have a disability/health consideration that may require accommodations, please feel free to approach me.
Percentage scores will be converted to letter grades according to the university-wide standard table.
You are expected to attend the classes.
If you miss some course components due illness, accident, family affliction, or religious observances, please talk to me and provide the documentation. In such cases, the course component is excused, and your course score will be calculated by distributing the weight of the missed item(s) across the other course components. Missing components are limited to at most 20%.