Collections, method chains and train wrecks (and SQL)

Last Friday, I got to teach about collections and closures in Ruby for ECC. That gave me an idea to write a post about one of the mistakes people coming from other languages tend to make when going into Ruby.

Let’s take the first problem from Project Euler:

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

Find the sum of all the multiples of 3 or 5 below 1000.

Looks simple enough. In pseudocode, your typical fresh grad programmer might do this:

sum <- 0
for i <- 1 to 999
  if i % 3 == 0 or i % 5 == 0
    sum += i
  end if
end for

A rubyist, however, will compress that 6 line program into a single line. Here is one possible solution:

(1..999).select { |x| x % 3 == 0 or x % 5 == 0 }.reduce(:+)

This line of code chains the 3 main components of the algorithm above:

  1. (1..999) - find a way to process numbers from 1 to 999. Here we created a Range that we can process as a whole.
  2. .select { |x| x % 3 == 0 or x % 5 == 0 } - process only the multiples of 3 and 5. Here, the method called selects only the elements that return true inside the passed block.
  3. .reduce(:+) - find the sum of the elements. Here we used the shorthand form of Ruby's reduce operation that sums the elements.

Let's try a harder example, problem 6:

(1..100).reduce(:+) ** 2 - (1..100).map { |x| x * x }.reduce(:+)

Here we see Ruby's map, which simply creates a copy of the source collection and applying the mapping function to each element. The map above is pretty trivial; we could even replace it with the long form of the reduce method.

(1..100).reduce(:+) ** 2 - (1..100).reduce(0) { |sum, x| sum + x * x }

While method chaining wouldn't be new to the novice developer, the concept of passing functions to methods, allowing greater flexibility, will be. Functional programming has been long forgotten even at the top universities in this country.

Another problem is that method chains can be too long. Some people call these chains "train wrecks". Obviously, this is a subjective matter, but one cannot deny that very long method chains are hard to debug. For example, here's one possible solution to problem 20:

(2..100).reduce(:*).to_s.scan(/./).map { |x| x.to_i }.reduce(:+)

This line simply:

  1. creates a range from 2 to 100 (1 is ignored in the factorial)
  2. calculate the factorial by multiplying them together
  3. convert it to a string
  4. create an array whose elements consist of single characters from the string (split("") also works)
  5. convert each element to integer
  6. calculates the sum of the elements

One way of debugging this long method chain would be to insert a tap method call to inspect the intermediate value of the chain. For example, if you do this:

(2..100).reduce(:*).to_s.scan(/./).map { |x| x.to_i }
.tap { |x| puts x.inspect }.reduce(:+)

you'll get the array of numbers before the reduce.

irb(main):001:0>(2..100).reduce(:*).to_s.scan(/./).map { |x| x.to_i }
.tap { |x| puts x.inspect }.reduce(:+)
[9, 3, 3, 2, 6, 2, 1, 5, 4, 4, 3, 9, 4, 4, 1, 5, 2, 6, 8, 1, 6, 9, 9, 2, 3, 8, 8 
, 5, 6, 2, 6, 6, 7, 0, 0, 4, 9, 0, 7, 1, 5, 9, 6, 8, 2, 6, 4, 3, 8, 1, 6, 2, 1,
4, 6, 8, 5, 9, 2, 9, 6, 3, 8, 9, 5, 2, 1, 7, 5, 9, 9, 9, 9, 3, 2, 2, 9, 9, 1, 5,
 6, 0, 8, 9, 4, 1, 4, 6, 3, 9, 7, 6, 1, 5, 6, 5, 1, 8, 2, 8, 6, 2, 5, 3, 6, 9, 7
, 9, 2, 0, 8, 2, 7, 2, 2, 3, 7, 5, 8, 2, 5, 1, 1, 8, 5, 2, 1, 0, 9, 1, 6, 8, 6,
4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
=> 648

Not exactly pretty, nor is it the most interesting use of tap, but it still gets the work done.

As a bonus, I'd just like to share a realization I had a while back.

Web developers shouldn't have to have problems with list processing because they deal with lists all the time: in SQL!

Think about it, you can define filter options in WHERE clauses, while map and reduce can be done in the SELECT clause. Assuming you have a table numbers with a column number with 100 records, each corresponding to numbers from 1 to 100, problem 6 can be solved by the following SQL statement:

SELECT SUM(number) * SUM(number) - SUM(number * number) FROM numbers

Things To Do This New Year: Software Engineering

You know the drill.

Learn a new language to complement your programming skills.

It would be a typical New Year’s resolution for developers to learn a new programming language this year. But seriously, what’s the point of learning C# when you’re a Java developer (or vice versa)?

What you should be striving for are programming languages that are orthogonal to your current skill set. If you’re an enterprise developer used to statically typed OO programming languages, try dynamic languages like Python and Ruby. If you’re already using dynamic languages, try your hands on functional programming like Erlang and Scala. Same goes for platforms: web developers might want try programming in RIAs.

The point here isn’t to add bullet points to your resume, but to have different ways of looking at problems, like adding new tools to a toolbox. For example, had I not been aware of the basics of functional programming, I might have tried to force traditional Java-like synchronization techniques in my Google Wave gadgets instead of the more elegant FP approach.

Just a short plugging:

Rapid Development‘s Classic Mistakes (in software development) was a real eye-opener for me when I read it four years ago. Even though it was written almost a decade ago, a lot of the mistakes listed there were still present in my company.

To keep the list up to date, Construx (Steve McConnell’s company) is now holding the Classic Mistakes survey for 2010. Help update the study by taking the survey here.

Inject/Reduce doesn’t work on lists of ActiveRecords

Problem:

The following line of code for calculating the total order cost returns “#”.

@purchase_items.reduce() {|cost, item| cost += item.qty_ordered * item.unit_price }

Cause:

The default memo item for a reduce operation in Ruby is the first item on the list (because of this, it starts the reduce on the second item).

Solution:

You can set the initial memo item by passing it to the reduce function.

@purchase_items.reduce(0) {|cost, item| cost += item.qty_ordered * item.unit_price }