Key Ideas

  • A csv (comma-separated values) file is a common file format. The file grades.csv is an example of a csv file that might be used to keep track of grades in this class.
  • View how the file renders in Excel (or Google Sheets) and in the text editor. The newline character indicates a new row and the commas indicate a new cell.
  • Creating a new file from an existing file is a common task for a computer scientist.

Active Learning

  • Write a Python program that creates a file named grades2.csv. The new file should contain all of the information from grades.csv In addition, there should be a fourth column entitled Average. For each student, the value in the new column should be the average of the two grades.

Creating the File Objects

# files3.py

input_file = open("grades.csv", "r")
output_file = open("grades2.csv", "w")

input_file.close()
output_file.close()

Editing the Header

The first line of a csv file usually contains the names of each column, so it is unique from the other lines. We usually handle this first line outside any kind of for- or while-loop.

To remove the “\n” (newline character) at the end of the header, we can slice it off using line[:-1]. This is called “mungeing the data”–editing it into a proper or more useable format.

# files3.py

input_file = open("grades.csv", "r")
output_file = open("grades2.csv", "w")

# Edit Header
header = input_file.readline()    # grabs header only
header = header[:-1] + ", Average"	# slice off the \n at the end of the line

input_file.close()
output_file.close()

Munge the Data Lines

We will run into the same problem with newline characters at the end of each of our csv lines. We can deal with this by using the same slice technique we did with the header. Then we use the split(", ") method to separate the line into elements in a list.

It’s a good idea to test your code with lots of print() statements to make sure your variables mean what you think they mean (that they actually contain the data you think). At this point, we want to use print(values) to test how our code is processing the data of the csv file.

# files3.py

input_file = open("grades.csv", "r")
output_file = open("grades2.csv", "w")

# Edit Header
line = input_file.readline()    # grabs header only
line = line[-1] + ", Average"	# slice off the \n at the end of the line

for line in input_file:
    line = line[:-1]
    values = line.split(", ") 

    print(values)

input_file.close()
output_file.close()

Calculating the Average

The next step is to calculate the average score for each student. Once again, we should test our code with print(values) and print(line).

# files3.py

input_file = open("grades.csv", "r")
output_file = open("grades2.csv", "w")

# Edit header
header = input_file.readline()
header = header[:-1]
header += ", Average"
output_file.write(header + "\n")

# Edit data
for line in input_file:
    line = line[:-1]
    values = line.split(", ")

    # find average
    average = round((int(values[1]) + int(values[2])) / 2)
    print(line + "\n")

input_file.close()
output_file.close()

Adding Average Score to CSV Line

Now that we are calculating the average scores correctly for each student, the next step is to concatenate this score to the line string in the proper csv format.

# files3.py

input_file = open("grades.csv", "r")
output_file = open("grades2.csv", "w")

# Edit header
header = input_file.readline()
header = header[:-1]
header += ", Average"
output_file.write(header + "\n")

# Edit data
for line in input_file:
    line = line[:-1]
    values = line.split(", ")

    # find average
    average = round((int(values[1]) + int(values[2])) / 2)
    line += ", " + str(average)
    print(line + "\n")

input_file.close()
output_file.close()

Testing and Writing File Output

Our program is now reading data correctly, calculating and adding the average score to each data line, and printing to the screen correctly (once we delete all other print() statements we created in development).

Finally, we replace the print statements with output_file.write; the parameters stay exactly the same:

# files3.py

input_file = open("grades.csv", "r")
output_file = open("grades2.csv", "w")

# Edit header
header = input_file.readline()
header = header[:-1]
header += ", Average"
output_file.write(header + "\n")

# Edit data
for line in input_file:
    line = line[:-1]
    values = line.split(", ")

    # find average
    average = round((int(values[1]) + int(values[2])) / 2)
    line += ", " + str(average)
    output_file.write(line + "\n")


input_file.close()
output_file.close()