# Workshop: working with files

### February 13, 2022

In [None]:
# Question: why does read return the number of characters read?

# Answer: if something goes wrong, we want to know where to "pick up" again
#         similarly, if we are trying to check that we wrote to a file successfully,
#         we can check that the number of characters written successfully matches the
#         length of the string we tried to write.

## Problem 1: reading files (warmup)

Write a function `warmup_fn` that takes a string (encoding a filename) and a character (i.e., a length-1 string), and returns an integer corresponding to how many times that character appears in the file.

Include appropriate error checking to ensure that both arguments are strings and that the character is indeed a length-1 string.

Think about how your function should behave if the given file does not exist.

In [1]:
def warmup_fn( fname, c ):
    '''
    fname is a string, encoding a file name
    c is a character (i.e., length-1 string).
    
    Return the number of times (as an int) that c appears in the given file
    '''
    
    # TODO: code goes here.

You can test your function on the file stored here: <a href="https://pages.stat.wisc.edu/~kdlevin/teaching/Spring2022/STAT679/democode/count_test.txt">https://pages.stat.wisc.edu/~kdlevin/teaching/Spring2022/STAT679/democode/count_test.txt</a>

This file contains only the text `The quick brown fox jumps over the lazy dog.`.

For example, `warmup_fn('count_test.txt', 'q')` should return `1`, while `warmup_fn('count_test.txt', 'Z')` should return `0` (because while there is  a lower-case `z`, there is no upper-case `Z`).

Use your function to count how many times the character `e` appears in the <a href="https://en.wikipedia.org/wiki/Universal_Declaration_of_Human_Rights">Universal Declaration of Human Rights</a>.
You can find a `.txt` file containing the text of the declaration here: <a href="https://pages.stat.wisc.edu/~kdlevin/teaching/Spring2022/STAT679/democode/udhr.txt">https://pages.stat.wisc.edu/~kdlevin/teaching/Spring2022/STAT679/democode/udhr.txt</a>.

In [12]:
warmup_fn('udhr.txt', 'e')

1046

## Problem 2: creating files

Define a function `save_my_list_naive` (we're calling it naive because we're going to implement a more clever version below) that takes two arguments: a string and a list.
The string should specify a filename, and your function should create that file, and write each element of the list to the file, one line per item, casting each item to a string.
If the file does not exist, your function should create it.
If the file already exists, this function should overwrite it.

So, for example, `save_my_list_naive('list.txt', [1,'cat',(1,2,3)])` should create a file `list.txt` with three lines (one for each element of the list), whose lines are

- '1'
- 'cat'
- '(1,2,3)'

There is no need to perform error checking in this function unless you really want to-- you'll find that Python's file operations should do most of the error checking for you.
That being said, you could include a check that the first argument is indeed a string.

<b>Hint:</b> try running `str( (1,2,3) )` and `str( [1, 'a', 3.14] )` to see what happens when you cast a tuple or list to a string.

<b>Bonus challenge:</b> You should be able to write your function in such a way that the second argument can be any sequence-like object (e.g., a string, a tuple, a dictionary), and the function will still behave in a (somewhat) reasonable way.

In [15]:
def save_my_list_naive( fname, t ):
    '''
    fname is a string encoding a filename.
    t is a list (or sequence-like object)
    
    Opens fname for writing, over-writing any existing such file,
    and writes each element of t to that file, casting to string as needed,
    one element per line.
    '''
    
    # TODO: code goes here.
    

Now, the natural thing to do next would be to define a companion function `load_my_list`, that takes a file name as its argument and reads the file, one line at a time, turning the arguments back from strings to their appropriate objects.

A few moments of thought should reveal why this doesn't quite work-- Python doesn't automatically know whether a string encodes an integer, a list, a dictionary or... well, just a string!

It seems we are in a bit of a pickle...

## Problem 3:  in a pickle

The Python `pickle` module is designed to solve exactly the issue that we just ran into.
It provides a way of encoding objects of any sort-- strings, lists, dictionaries... even more complicated objects that we define ourselves, though that will need to wait another week or two.

Define a function `save_my_list` that has the same signature as `save_my_list_naive` (<b>Reminder:</b> the "signature" of a function is just the number of arguments it takes and the types of those arguments).
This function should take each element of the list, pickle it, and write the resulting string to its own line in the file.

In [19]:
import pickle

# NOTE: this won't run as written, because pickle.dumps returns a byte string.
# Read the documentation for f.write to see what to do!
def save_my_list( fname, t ):
    # TODO: code goes here.

Now, write the companion function that we wanted to write before.
`load_my_list` should take a string as its only argument and return a list.
The string argument is assumed to encode the name of a file in which each line is a pickled object.
The function should read this file one line at a time, "unpickling" each line, and saving the "unpickled" object as an element of a list.
The resulting list should have as its `i`-th element the result of "unpickling" the `i`-th line of the file.

As a sanity check, for a list `t1`, after running `save_my_list('test.txt', t1); t2=load_my_list('test.txt')`, the lists `t1` and `t2` should be identical.

In [None]:
def load_my_list( fname ):
    # TODO: code goes here

In [None]:
save_my_list('test.txt', t1)
t2=load_my_list('test.txt')
# t1 and t2 should be identical

In [21]:
# NOTE: an easier way to do the above would be to just pickle the whole list as-is,
# and let pickle handle recursing on the list (which it will do!)
# Something like
t = [1,'cat',3.14]
pickle.dumps(t)

b'\x80\x03]q\x00(K\x01X\x03\x00\x00\x00catq\x01G@\t\x1e\xb8Q\xeb\x85\x1fe.'