Session 9

Reading files

(Note: you can find all the files you need for this class in the "Files" button at the bottom of the page.)

There's a lot of data out there. You can write programs to do stuff with this data. But how? If you are lucky, the data is in a standard format. One simple format is called CSV, which stands for comma-separated values. This means that you have a set of records, one on each line, where each value (or column) in a record is separated by a comma.

The US government releases all kinds of interesting data for free. One such source of data is the US Census which occurs every ten years. They are already getting ready for the next one in 2010. But the data from the 2000 Census is available on the web.

Say you wanted to know how popular your last name was. How does it rank against all the other surnames in the US? This file has all the surnames collected in the last census where at least 100 people had that last name. See also this web page.

Here's a program that reads through that file, using the ruby File object, looking for the name you entered. Note that we are using a new String method called split. String.split takes a string and turns it into an array of strings by breaking it apart using a separator that you supply. Since we are looking at comma-separated values, we use a comma as the separator.

So here's the program.

filename = 'app_c.csv'
file = File.open(filename)
 
puts "What's your surname?"
your_name = gets.chomp.upcase
 
found = false
first_line = true
file.each do |line|
  if first_line
    first_line = false
    next
  end
 
  data = line.chomp.split(',')
  name = data[0]
  rank = data[1]
 
  if your_name == name
    puts "Your surname is ranked #{rank}th among the names of count >= 100 in the 2000 Census"
    found = true
    break
  end
end
 
puts "I could not find your surname among the names of count >= 100 in the 2000 census" unless found

Here's another version that uses a Hash. How are these programs different?

filename = 'app_c.csv'
file = File.open(filename)
 
name_data = Hash.new
first_line = true
 
file.each do |line|
  if first_line
    first_line = false
    next
  end
 
  data = line.chomp.split(',')
  name = data[0]
  rank = data[1]
 
  name_data[name] = rank
end
 
puts "What's your surname?"
name = gets.chomp.upcase
if name_data.has_key?(name)
  rank = name_data[name]
  puts "Your surname is ranked #{rank}th among the names of count >= 100 in the 2000 Census"
else
  puts "I could not find your surname in among the names of count >= 100 in the 2000 census"
end

Writing Files

Now lets try our hand at writing files. Notice how you opened a file for reading with the File.open method? You can open a file for writing with the same method. You just need to add an optional parameter to the method to indicate that you want to write to the file instead of just reading it. Once you have an open file, you can call the puts method on it to write strings to it.

Note that I have to close the file when I'm done writing stuff to it.

I found a file listing the US Presidents and I wanted to know how their names ranked. Here's the code.

president_file = 'presidents.txt'
name_file = 'app_c.csv'
 
president_surnames = []
name_data = {}
 
File.open(president_file).each do |line|
  name = line.split(',').first
  surname = name.split(' ').last
  president_surnames << surname
end
 
first_line = true
File.open(name_file).each do |line|
  if first_line
    first_line = false
    next
  end
 
  data = line.chomp.split(',')
  name = data[0]
  rank = data[1]
 
  name_data[name] = rank
end
 
output = File.open('president_name_ranks.txt', 'w')
president_surnames.each do |surname|
  if name_data.has_key? surname.upcase
    output.puts "#{surname},#{name_data[surname.upcase]}"
  else
    output.puts "#{surname},0"
  end
end
 
output.close
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License