Several of my fellow PhD students here have to do a substantial amount of programming as part of their PhDs: unfortunately, most of them haven't done any programming before. The usual procedure, alas, is to hand them a Fortran compiler and tell them to get on with it, often hacking on a large mass of code written by someone else who was "taught" the same way. I try to do what I can to help, but there's a limit to the amount of time I can devote to someone else's project (and a limit to the amount of time they'd want me to devote, I suspect). But still, I see some horror stories: yesterday, for instance, an office-mate finally tracked down a bug due to a
magic number which had been bothering her for longer than she cared to say, and which had been seriously undermining her confidence in her actual model. Not using magic numbers is basic programming practice, but nobody had told her this.
So I've been thinking about an introductory course on programming aimed at maths/science grad students. The emphasis would be on writing maintainable code and modern programming practices: modularity, use of libraries wherever possible, test-first programming, use of debuggers, source control systems and profilers, optimising later (if you have to at all), use of high-level languages, documentation, and so on. My real aim would be to break the cycle of abuse whereby each new generation of grad students is told to write 1000-line-to-a-function, opaque, untested, rape-and-paste Fortran by their supervisors, because it was good enough for their supervisors, and on and on...
Here's a first cut at a course catalogue entry for this fantasy course: I'd be very interested to hear everyone's comments.
Practical Computer Programming for Scientists
The use of computers is now widespread in mathematics and science, but all too few scientists are aware of the techniques that are standard in industry for creating correct, maintainable code. This course is a ground-up introduction to computer programming emphasising code clarity and maintainability, and the use of standard tools like debuggers, profilers, test frameworks and version control systems.
The language of instruction will be Python, a modern multi-paradigm language famed for its simplicity, but most of the lessons of the course (and all of the important ones) will transfer easily to any reasonably mainstream language. Indeed, if you want to learn to program in language X it will almost certainly be faster to learn to program in Python and then learn language X. In addition, Python is a powerful and useful general-purpose programming language in its own right. The differences between Python and other languages will be explained at appropriate points throughout the course.
The course will consist of $weeks_in_term two-hour lab sessions, as follows:
Lab 1: Basics
Hello, World; creating and running Python programs; input/output; variables; loops; conditionals; use and creation of functions; recursion; functions are data; interaction with the filesystem.
Lab 2: Structured data
Strings, integers and floats; lists; dictionaries; trees; graphs; recursion on data-structures; objects and classes; reflection; pickling (serialisation).
Lab 3: Modules
Using standard modules; creating modules of your own; documenting your modules; scope; some useful modules from the standard library; finding and installing new modules from the Web; interfacing to other languages.
Lab 4: Testing and debugging
PyUnit and friends; unit testing versus functional testing; white-box versus black-box testing; what to test; test-first programming; testing versus proofs of correctness; coverage analysis; debugging with print statements; use of the debugger.
Lab 5: Version control
Basic concepts of version control; use of (cvs|subversion|darcs|whatever we have available); branching and merging; regression testing.
Lab 6: Text munging
String manipulation; globs; regular expressions; parser generators and Backus-Naur Form; parsing and manipulating XML; analysing data in textual form; code that writes code.
Lab 7: GUI programming and event-driven programming
Writing Graphical User Interfaces with [Python folks: what's the best GUI toolkit to use for this? There seem to be so many...]; event-driven programming for GUIs; event-driven programming in other contexts, including SAX.
Lab 8: Numeric and array processing
Array programming; SciPy and its capabilities; limitations of floating-point arithmetic; IEEE special values.
Lab 9: Optimisation
When to optimise; use of profilers; basic complexity theory.
Lab 10: Round-up
Summary of good programming practice; anything we missed.
Notes:
- Python really feels like the obvious choice for this. I want something high-level, to teach by contrast the unnecessary pain of using low-level languages. Java, Ruby and (especially) Perl have syntax that's too complex: I want to spend the absolute minimum time on syntax and the absolute maximum on concepts, and Python's the most syntactically simple mainstream language that I know of. Java is also too tightly-wedded to the OO paradigm. They'd probably have a better shot at Ultimate 1337th if I started them off on Haskell, or Scheme, or C, or J, but that's not the aim: I want to get them to the stage where they can write useful code in support of their actual work without shooting themselves in the foot more than necessary. Hence we need a language that's reasonably similar (while still obviously better than) what they'll actually be using (probably C/C++, or Matlab, or (spit!) Fortran). And Python has an excellent standard library, which would be a great help in teaching the lesson that you should rely on your libraries wherever possible.
- It would be nice to have some overall goal for the course: write a small Lisp interpreter, or a game, or a raytracer, or something.
- There's no multiprogramming in there. It's not something I know a lot about, but it's probably pretty useful for people who'll end up doing hardcore numeric stuff. Where should it go? What should be in the multiprogramming lecture?
- There's nothing specifically about Web programming in there. Or databases. Databases would be a good one for another lab if $weeks_in_term > 10. Or maybe it should kick something out. But what? GUI programming? Text munging?
- Lab 1 might be a bit over-full, and Labs 5 and 9 might be a bit short.
- I think Lab 3 (modules) should occur after we've used a couple of standard modules (here, sys and pickle). But the precise ordering of Labs 2-5 is tricky. Maybe I should move use and creation of functions to the Modules lab, and re-name it (The Religion of) Modularity. And they're certainly going to have done some debugging already by the time they get to Lab 4.
- What "useful modules" should go in the Modules lecture? Or should I spread the standard modules throughout the course?
- Where should "scope" go?
- Lab 6 would really be a more general lab on data-munging, but examples would mostly be textual - sequences of DNA bases, Unix config files, and so on. Maybe I could get them to write a symbolic differentiator or something.