{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Workshop: working with files\n",
"\n",
"### February 14, 2022"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Problem 1: reading files (warmup)\n",
"\n",
"Write a function `warmup_fn` that takes a string (encoding a filename) and a character (i.e., a length-1 string), and returns an integer corresponding to how many times that character appears in the file.\n",
"\n",
"Include appropriate error checking to ensure that both arguments are strings and that the character is indeed a length-1 string.\n",
"\n",
"Think about how your function should behave if the given file does not exist."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# CODE GOES HERE."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can test your function on the file stored here: https://pages.stat.wisc.edu/~kdlevin/teaching/Spring2022/STAT679/democode/count_test.txt\n",
"\n",
"This file contains only the text `The quick brown fox jumps over the lazy dog.`.\n",
"\n",
"For example, `warmup_fn('count_test.txt', 'q')` should return `1`, while `warmup_fn('count_test.txt', 'Z')` should return `0` (because while there is a lower-case `z`, there is no upper-case `Z`)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use your function to count how many times the character `e` appears in the Universal Declaration of Human Rights.\n",
"You can find a `.txt` file containing the text of the declaration here: https://pages.stat.wisc.edu/~kdlevin/teaching/Spring2022/STAT679/democode/udhr.txt."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Problem 2: creating files\n",
"\n",
"Define a function `save_my_list_naive` (we're calling it naive because we're going to implement a more clever version below) that takes two arguments: a string and a list.\n",
"The string should specify a filename, and your function should create that file, and write each element of the list to the file, one line per item, casting each item to a string.\n",
"If the file does not exist, your function should create it.\n",
"If the file already exists, this function should overwrite it.\n",
"\n",
"So, for example, `save_my_list_naive('list.txt', [1,'cat',(1,2,3)])` should create a file `list.txt` with three lines (one for each element of the list), whose lines are\n",
"\n",
"- '1'\n",
"- 'cat'\n",
"- '(1,2,3)'\n",
"\n",
"There is no need to perform error checking in this function unless you really want to-- you'll find that Python's file operations should do most of the error checking for you.\n",
"That being said, you could include a check that the first argument is indeed a string.\n",
"\n",
"Hint: try running `str( (1,2,3) )` and `str( [1, 'a', 3.14] )` to see what happens when you cast a tuple or list to a string.\n",
"\n",
"Bonus challenge: You should be able to write your function in such a way that the second argument can be any sequence-like object (e.g., a string, a tuple, a dictionary), and the function will still behave in a (somewhat) reasonable way."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#CODE GOES HERE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, the natural thing to do next would be to define a companion function `load_my_list`, that takes a file name as its argument and reads the file, one line at a time, turning the arguments back from strings to their appropriate objects.\n",
"\n",
"A few moments of thought should reveal why this doesn't quite work-- Python doesn't automatically know whether a string encodes an integer, a list, a dictionary or... well, just a string!\n",
"\n",
"It seems we are in a bit of a pickle..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Problem 3: in a pickle\n",
"\n",
"The Python `pickle` module is designed to solve exactly the issue that we just ran into.\n",
"It provides a way of encoding objects of any sort-- strings, lists, dictionaries... even more complicated objects that we define ourselves, though that will need to wait another week or two.\n",
"\n",
"Define a function `save_my_list` that has the same signature as `save_my_list_naive` (Reminder: the \"signature\" of a function is just the number of arguments it takes and the types of those arguments).\n",
"This function should take each element of the list, pickle it, and write the resulting string to its own line in the file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#CODE GOES HERE."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, write the companion function that we wanted to write before.\n",
"`load_my_list` should take a string as its only argument and return a list.\n",
"The string argument is assumed to encode the name of a file in which each line is a pickled object.\n",
"The function should read this file one line at a time, \"unpickling\" each line, and saving the \"unpickled\" object as an element of a list.\n",
"The resulting list should have as its `i`-th element the result of \"unpickling\" the `i`-th line of the file.\n",
"\n",
"As a sanity check, for a list `t1`, after running `save_my_list('test.txt', t1); t2=load_my_list('test.txt')`, the lists `t1` and `t2` should be identical."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#CODE GOES HERE."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 2
}