Data Science Computing Project: Tentative Schedule

(Syllabus)

Day #: Date Subject Homework Due (11:59 p.m.)
1: Tu 1/21/25 Login to a linux computer
Why learn linux? See TOP500 operating systems and Linux
Basic Linux commands
Read introductory email
2: Th 1/23 Mention programmer virtues.
Basic linux, continued (Continue with sed)
3: Tu 1/28 FYI: CHTC researcher forum We 2/26/25
Preview HPC (if you did not do this already)
emacs text editor: reference sheet, demo1 (data.txt, tiny.R, sifting.txt)
Solve basic Linux exercises
4: Th 1/30 emacs, continued: demo2
emacs regular expressions
Q01: linux
Q01
5: Tu 2/4 (Dislike emacs? Try nano.)
mention VS Code alternative to emacs (thanks, Zhaoqing)
Lyman-break galaxies
discuss HW2
HW1 help
6: Th 2/6 Lyman-break galaxies, continued: search ideas
HW2 help
HW1: emacs
7: Tu 2/11 HW2 help
Q02: emacs
Q02
8: Th 2/13 git/GitHub version control system
HW2: galaxies on public0[234]
9: Tu 2/18 Group1: Git Exercise (TA GitHub ID: Ming5723)
10: Th 2/20 Linux (bash) shell scripting
Group1: Git Exercise
11: Tu 2/25 shell scripting, continued (from p. 4)
12: Th 2/27 Q03: git
Group2: scripting exercises
Q03
13: Tu 3/4 Group2: scripting exercises, continued
14: Th 3/6 discuss project
Statistics High Performance Computing Cluster (HPC)
Group2: shell scripting
15: Tu 3/11 Statistics HPC, continued (from end of 4jobArray: run it!)
Q04: shell scripting (delayed to Tu 3/18)
Group4(a): Project group
Q04 (delayed to Tu 3/18)
16: Th 3/13 Check project groups, report troubles
discuss HW3
High Throughput Computing at CHTC: guest lecture by Amber Lim and Danny Morales (slides, documentation, help)
17: Tu 3/18 revisit seff on slurm-submit-00
CHTC commands, examples, and references: tinyExamples.tar
Slurm vs. HTCondor
HW3 help
develop project proposals
Q04
Q04
18: Th 3/20 parallel sd example (wget http://www.stat.wisc.edu/~jgillett/DSCP/CHTC/sd.tar; run condor_submit_dag sd.dag)
HW3 help
develop project proposals
HW3: airlines on Slurm (delayed to Tu 4/1)
Group4(b): project proposal
[Tu 3/25, Th 3/27]
[no class: spring break]
19: Tu 4/1 parallel sd example, continued (run it!)
Group3: parallel word counting
HW3 (delayed to here from 3/20)
20: Th 4/3 schedule proposal feedback meetings
Using R at CHTC (wget http://www.stat.wisc.edu/~jgillett/DSCP/CHTC/calling_R_or_python.tar)
Group3: parallel word counting, continued
21: Tu 4/8 discuss HW4
Group3: CHTC, continued
Group4: project development
Q05: distributed computing (Slurm and HTCondor)
Q05
22: Th 4/10 project proposal feedback: meet in class with teacher and TA
proposal feedback meeting in class
Group3: CHTC
23: Tu 4/15 FYI: undergraduate research
Optional CHTC /staging/groups/stat_dscp/group01 ... group13 folders for large files (< 200 GB)
HW4 help, project help

24: Th 4/17 set presentation schedule
HW4 help, project help
HW4a: galaxies on CHTC
25: Tu 4/22 Save files from public0[234]/slurm-submit-00/learn to laptop/email/Github/etc. by 5/6/25.
project help
HW4b: more galaxies (extended to Th 4/24)
Mo 4/28: presentation slides
26: Th 4/24 project help
27: Tu 4/29 Group4(c): first 1/2 of project presentations
28: Th 5/1 Group4(c): second 1/2 of project presentations
Group4(d): project report