Welcome to Software Carpentry Etherpad for the Oct 6-7th. workshop at Harvard University!
This pad is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents.

Use of this service is restricted to members of the Software Carpentry and Data Carpentry community; this is not for general purpose use (for that, try https://etherpad.wikimedia.org/).

Users are expected to follow our code of conduct: http://software-carpentry.org/conduct.html
All content is publicly available under the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/

We will use this Etherpad during the workshop for chatting, taking notes, and sharing URLs and bits of code.

Instructors:
    Byron Smith, PhD Candidate, University of Michigan - <bsmith89@gmail.com>
    James Mickley, PhD Candidate, University of Connecticut - http://jamesmickley.com - <james.mickley@uconn.edu>

    
Helpers:
    Jeremy Muhlich
    Gabriel Berriz
    Douglas Russell
    David "Quint" Gribbin

Attendees:
Options for lunch:
==============
YES! pizza?

=================================================================================================================

Day 1

Setup:
1. Download and install software from our course website: http://tinyurl.com/harvard-swc
2. Go to Socrative, and put in MICKLEY as the room: https://b.socrative.com/login/student/
3. Put your name under Attendees above (You can get here from the Etherpad link on the course website)
4. Navigate to https://sorgerlab.github.io/2016-10-06-harvard/setup/index.html


Unix Shell
==========
Follow along with what I typed: https://www.dropbox.com/s/09kdy6gyocpxigs/shell.txt
Data for shell: http://swcarpentry.github.io/shell-novice/data/shell-novice-data.zip
A really useful website for figuring out what a complicated line of commands does: http://explainshell.com/


On mac you can get into the shell by finding the application called "Terminal"
On windows, you should use "Git-Bash"

/Users/<USERNAME> means that we're in a directory ("<USERNAME>") inside of another directory

## Getting Help ##

ls --help  # on windows or mac
man ls    # on mac
http://man.cx/ls   (in a browser)

hidden files: files whose names begin with a . are "hidden"; they are not shown by default by ls or other tools

ls -F  (-F is a "flag" which changes the behavior of the `ls` command to print a "/" after directories)
(NB: some versions of ls print a "/" after directories by default, without needing the -F flag)
ls -a (show "all" which includes "hidden" files/directories)

~ stands for the home directory

cd <SOME DIRECTORY> (change directories)
cd (with no argument: changes directory to ~)
cd - (change directory to the previous directory)

../ is a sort-of directory which refers to the "parent" directory (the directory which contains our current directory)
./ is another "sort-of" 

The `/` at the front of a path means the "root directory" (the directory that contains all other directories)
Within a path the `/` separates directories going from the "top" directory to the "bottom" (most specific) directory

Naming files:
    1. avoid spaces in file names
    2. avoid leading hyphens in file names
    3. when including dates in file names:
        - use the year-month-day convention (e.g. 2016-01-23)
        - use the 2-digit version of months and days (e.g. 2016-01-01, instead of 2016-1-1)

nano filename : edit a file
While using nano:
    control-X : exit (prompts to save file if you haven't already)
    control-O : save file (prompts for filename if you want to change it)

to remove a file
  rm FILENAME
  rm -r directory  # remove a directory and its contents ("recursive")
  rm -r -i directory  # it will ask you whether it should delete each file and directory ("interactive")
REMEMBER: rm is forever!

to rename or move a file:
    mv original_name new_name  # rename a file
    mv subdirectory/filename .   # move a file from a subdirectory to the current directory

to copy a file:
    cp old_file new_file   # make a copy of old_file, named new_file

NB: cp and mv will overwrite existing files; for example, if one runs
    mv foo.txt bar.txt
and a file bar.txt happens to already exist, it will be overwritten by foo.txt

in the shell, to recall an earlier command use the up arrow
the up and down arrows can be used to navigate through the shell's history
recalled commands can be edited before re-running them


wc FILENAME   # print number of characters, words, and lines in a file
# If you don't type a filename, you will get "stuck" press control-C to get out.
wc -l FILENAME  # print just number of lines

* is a wildcard characters that matches any number (zero or more) of characters in a filename (except for the leading .)
? is a wildcard character that matches exactly one character (except for a leading .)
*[AB] matches all files whose names end with A or B
( http://regexr.com is a site where you can learn more about "regular expressions", the system that defines the [AB] style of matching)

The > character (greater-than) is used after a command to "redirect" the output from the command into a file:
    wc -l *.pdb > lengths.txt
NB: COMMAND > FILENAME will always overwrite FILENAME

The < character (less-than) is used after a command to "redirect" the input to the command from a file:
    wc < methane.pdb

cat FILENAME prints the contents of FILENAME to the terminal
less FILENAME also prints the contents of FILENAME to the terminal, but page by page (less what is a known as a "pager")
to get out of less: type q

sort takes an input file and prints out the lines from the file in sorder order:
    sort file.txt
    sort -n lengths.txt   # the -n flag is required to sort numbers properly

head shows just the first few lines of a file:
    head -n 1 sorted-lengths.txt   # show just the first line
    head -n 5 sorted-lengths.txt  # show the first 5
    head -5 is a shortcut for head -n 5

tail is the counterpart of head: it shows n lines at the end of a file
    tail -n 1  # shows the last line of a file

Q: how to get only the second line of a file
A: use both head and tail; e.g.:
    head -n 2 FILENAME | tail -n 1


The | character (vertical bar or pipe) is used after a command to "pipe" the output from one command directly into another command:
    sort -n lengths.txt | head -n 1   # show the line with the smallest count from lengths.txt, without creating any extra files
Any number of commands may be "piped" together in a line:
    wc -l *.pdb | sort -n | head -n 1

"standard in" : source of input to commands when a file is not specified as an argument
"standard out" : target of output from commands

to kill current command: ^C (control-C)
this is useful when a command appears to be stuck

ASIDE: R has pipes too! (in dplyr package) %>% sign

there's no training wheels in the shell

UNIX philosophy: small, single-purpose tools that can be composed to perform complex tasks

echo hi  # just outputs hi to the terminal

for loops:
for VARIABLE in ITEM1 ITEM2 ITEM3 ... ITEMn
do
    ## shell comands using $VARIABLE
    ## where $VARIABLE sequentially takes on the values ITEM1, ITEM2, ITEM3, ..., ITEMn
done

Semicolons can be used to type complex commands (such as for loops) in a single line; e.g.
for filename in *.dat; do head -3 $filename; done


PROGRAMMING TIP: use human-readable variable names rather than cryptic

PROGRAMMING TIP: use indentation to indicate logical structure; e.g.
# GOOD
for filename in *.dat
do
    head -3 $filename
done

# NOT SO GOOD
for filename in *.dat

do
head -3 $filename
done

scripts are files that collect multiple shell commands, and that can be run all at once; example:

#!/bin/bash
# script to run the command frobozz on every *.txt file, appending the output to frobozz.log

for filename in *.txt
do
    echo $filename
    frobozz $filename >> frobozz.log
done

Note: the # character means "every thing that follows is a comment"; comments are only for the benefit of human readers; they are otherwise ignored

The shebang line
if the first line of a script begins with #!, it is called "the shebang line"; what follows this sequence gets used as the program that processes the contents of the script file
For example, the script above begins with the line

#!/bin/bash

...the program /bin/bash will be used to execute the script.





  Python
==========

Download and unzip on your desktop: http://swcarpentry.github.io/python-novice-gapminder/files/python-novice-gapminder-data.zip
Rename the directory to `gapminder`

To open a python command prompt (in shell), type: python
To run a python program in the shell: python somepythonscript.py

Command to open jupyter notebook (in shell): jupyter notebook

To create a notebook in jupyter:

Use SHIFT+Enter to run a command in a jupyter notebook

== Jupyter Notebook == 

Variables in python are not like cells in Excel.  
If one variable depended on another and you change the original variable, it won't change the later one

use CTRL+m  and then press m for changing code into markdown text.  You can still run it using SHIFT+Enter (but it will display formatted text)

Strings can be combined (or concatanated) with the '+' character

Variables store different types of data in python

Use the type() function to figure out what kind of data a python variable holds

You can convert between types of variables using functions like str(), float(), int(), bool(), list()

Variable names are case-sensitive.  By convention, Python discourages using capital letters

Unlike R, and some other languages, when using a list in Python, the first item is #0, not #1.  Eg: list[0]
The "slice" of a list is a subset of that list, only some of the items.
Strings can be "sliced" too in the same way to get part of the string

every 2nd item example: important_people[0:4:2]

== Some python built-in functions: ==
    print (Extremely useful for understanding your code or fixing bugs)
    type
    str
    bool
    int
    float
    round
    len


== Getting help in Python ==

To import a Python library, eg pandas or matplotlib: import pandas or import matplotlib

== Using Pandas (a data manipulation library for Python that works similarly to R) == 


== Using Matplotlib (a plotting library for Python) == 




=================================================================================================================

Day 2


  Python Continued  
================


== General instructions for using a Python library (eg. How do I find the cos(x)?) == 


== Using Pandas Continued ==


== For loops in Python ==

For loop syntax:
for loop_variable in list:

== If statements in Python ==

If statement syntax:
if variable == test_condition: elif variable == other_test_condition: else:


Accumulators

In class, we did the following problem:


The solution was

input_list = [1, 2, 3, 4, 5]
accum = 1
for num in input_list:
    if num > 2:
        accum = accum * num
print(accum)

To get more practice with the concept of accumulators, try the following variants of the problem we did in class:
1. find the sum of all the numbers greater than 2 in input_list (final value of accum should be 12)
2. produce a list of all the numbers greater than 2 in input_list (final value of accum should be [3, 4, 5])
3. produce a string consisting of the concatenation of the string form of all the numbers greater than 2 in input_list (final value of accum should be the string "345")

Hint: the solutions to all the problems above all have the same general structure as the solution to the product problem; only two lines will change: the one setting the initial value of the accumulator, and the one updating the value of the accumulator (inside the if-statement).


== Writing your own functions in Python == 

explainshell



  Afternoon: Git   
==============
 
Setup
1. Open your shell and cd to your home directory
2. Go to Socrative, and put in MICKLEY as the room: https://b.socrative.com/login/student/
 
Follow along with what I typed: https://www.dropbox.com/s/565vbg0c87bne58/git.txt


== Getting a Github Account == 
  1. Go to https://github.com/ and sign up for a new account
  2. You can also get git premium for FREE as educational professionals: https://education.github.com/discount_requests/new
  3. This comes with unlimited private repositories