In Ten numbers on a blackboard, someone asks about the largest number you can compute by reducing a set of numbers. I was surprised to see that someone spent quite a bit of time to brute force it and that their Python solution was so slow. So many of the things I write about in the “Benchmarking” and “Profiling” chapters come into play in this conversation.
The particular problem is to reduce a list of numbers by taking them off of the list and replacing them with the single value a*b + a + b. Expressed in Perl, I write:
my @array = 1 .. 10;
my $sum = 0;
foreach ( @array ) {
$sum = $sum * $_ + $sum + $_;
}
print "Sum is $sum\n";
The answer for that sequence is 39,916,799.
There’s a bit of a trick here by starting with $sum at 0. The first loop gets me back to the first item in the list. It’s a nice trick in programming to include a degenerate case and even nicer that Perl allows and even expects me to start with an undefined value. I could just as well start with the first element directly by shifting it into $sum:
my @array = 1 .. 10;
my $sum = shift @array;
foreach ( @array ) {
$sum = $sum * $_ + $sum + $_;
}
print "Sum is $sum\n";
Still, the question remains. Would I get the same answer no matter the order of the numbers in the sequence? I can generate all of the permutations of the list, something explained in the perlfaq4’s How do I permute N elements of a list? as well as StackOverflow’s How can I generate all permutations of an array in Perl?.
For this article, I’ll use Algorithm::Permute. It’s a bit of an odd interface because its permute function takes a code block where I get the permutation in an array variable of the same name:
use Algorithm::Permute ('permute');
my @array = 1 .. 10;
permute {
print "@array\n";
} @array;
Careful here! There are 10! permutations of the order of 10 distinct thingys. That’s 3,628,8000 lines of output. Let’s start with something a bit smaller. I’ll start with two items:
use Algorithm::Permute ('permute');
@ARGV = 1 unless @ARGV;
my @array = 1 .. $ARGV[0];
permute {
print "S = [@array] ";
my $sum = $array[0];
foreach ( 1 .. $#array ) {
$sum = $sum * $array[$_] + $sum + $array[$_];
}
print "sum is $sum\n";
} @array;
Here are a few runs where I see the sum is the same each time (more on that in a moment):
% permute 2 S = [1 2] sum is 5 S = [2 1] sum is 5
% permute 3 S = [1 2 3] sum is 23 S = [1 3 2] sum is 23 S = [3 1 2] sum is 23 S = [2 1 3] sum is 23 S = [2 3 1] sum is 23 S = [3 2 1] sum is 23
% permute 4 S = [1 2 3 4] sum is 119 S = [1 2 4 3] sum is 119 S = [1 4 2 3] sum is 119 S = [4 1 2 3] sum is 119 S = [1 3 2 4] sum is 119 S = [1 3 4 2] sum is 119 S = [1 4 3 2] sum is 119 S = [4 1 3 2] sum is 119 S = [3 1 2 4] sum is 119 S = [3 1 4 2] sum is 119 S = [3 4 1 2] sum is 119 S = [4 3 1 2] sum is 119 S = [2 1 3 4] sum is 119 S = [2 1 4 3] sum is 119 S = [2 4 1 3] sum is 119 S = [4 2 1 3] sum is 119 S = [2 3 1 4] sum is 119 S = [2 3 4 1] sum is 119 S = [2 4 3 1] sum is 119 S = [4 2 3 1] sum is 119 S = [3 2 1 4] sum is 119 S = [3 2 4 1] sum is 119 S = [3 4 2 1] sum is 119 S = [4 3 2 1] sum is 119
Rather than output every sum, I modify the program to find a sum that’s not the same. If I can find one that’s different I know that I might have to brute-force the problem to find the maximum:
use v5.10;
use Algorithm::Permute ('permute');
@ARGV = 1 unless @ARGV;
my @array = 1 .. $ARGV[0];
permute {
my $sum = $array[0];
foreach ( 1 .. $#array ) {
$sum = $sum * $array[$_] + $sum + $array[$_];
}
state $last_sum = $sum;
say "Sum for [@array] is special: $sum"
unless $last_sum eq $sum;
} @array;
I like this trick with state where I initialize it with the first value of $sum, but with this code I don’t see any output. Instead, I can use a hash to count the sums that I compute. If I accumulate more than one sum, I do more work to find the minimum and maximum values:
use v5.10;
use Algorithm::Permute ('permute');
use List::Util qw(min max);
@ARGV = 1 unless @ARGV;
my @array = 1 .. $ARGV[0];
my %Sums;
permute {
my $sum = $array[0];
foreach ( 1 .. $#array ) {
$sum = $sum * $array[$_] + $sum + $array[$_];
}
$Sums{$sum}++;
} @array;
if( 1 < values %Sums ) {
print "There is more than one sum for n=$ARGV[0]!";
my $min = min( keys %Sums );
my $max = max( keys %Sums );
say "Min: $min Max: $max";
}
else {
say "There is only one sum for n=$ARGV[0]" . (keys %Sums)[0];
}
Now for the brute-force. This is the part that caught my eye about imallett's Python solution, which he says:
For what it's worth, after reading the answers I decided to see how long a brute force program would take. My solution takes 0.43 seconds on the analogous problem of length 7, 12.3 seconds for length 8, and 442.1 seconds for length 9. I didn't let it run for length 10.
There are so many odd things about that statement? Twelve seconds? Doesn't that seem odd for 7! (=5040) operations? Remember in Mastering Perl that I strongly emphasized that we can only compare times on the same setups. But, even without trying very hard, my solution is three orders of magnitude faster:
% time perl permute 7 There is only one sum: 40319 real 0m0.033s user 0m0.027s sys 0m0.005s
One of imallett's problems is his use of recursion, a popular technique that's great for high-level languages that can compile it into something that isn't recursive. This isn't a problem that needs it. Mark Jason Dominus talks about this quite a bit in Higher-Order Perl too.
Here's imallett Python solution, which I find hard to unravel:
#!/usr/bin/python
import time
def rem_comb(i,j, l):
l2 = []
for index in range(len(l)):
if index == i or index == j: continue
l2.append(l[index])
a,b = l[i],l[j]
l2.append(a*b + a + b)
return l2
def experiment(l):
if len(l) == 1:
return l[0]
else:
best = (0, None,None)
for i in range(len(l)):
for j in range(i+1,len(l),1):
value = experiment(rem_comb(i,j, l))
if value > best[0]:
best = (value, i,j)
return best[0]
for length in range(1,10+1,1):
t0 = time.time()
value = experiment(list(range(1,length+1,1)))
t1 = time.time()
print("Length % 2d best %d (time %f)"%(length,value,t1-t0))
For N=7, it takes less than a second, but then it blows up at N=8 on my machine. These are roughly the same times he reports, but this is awful performance:
Length 1 best 1 (time 0.000006) Length 2 best 5 (time 0.000015) Length 3 best 23 (time 0.000043) Length 4 best 119 (time 0.000223) Length 5 best 719 (time 0.002201) Length 6 best 5039 (time 0.032876) Length 7 best 40319 (time 0.674944) Length 8 best 362879 (time 18.632982) Length 9 best 3628799 (time 683.095185)
I also strongly emphasized that in these sorts of issues, it's the algorithm that's usually the problem, not the language or the interpreter. I haven't written Python in a long time (and now I remember why), but this solution is much faster:
#!/usr/bin/python import time from itertools import permutations def add_it_up(p): sum = 0; for i in p: sum = sum * i + sum + i return sum def experiment(l): sum = 1; last_sum = 0 for p in permutations( range( 1, l+1 ) ): sum = add_it_up( p ) if last_sum > 0 and sum != last_sum: print( p ) print( "For %d, found two sums: %d %d" % ( l, sum, last_sum ) ) last_sum = sum return last_sum for length in range(1, 10+1, 1): t0 = time.time() value = experiment( length ) t1 = time.time() print( "Length % 2d best %s (time %f)" % (length,value,t1-t0) )
This leads to the advice I gave in Mastering Perl. Better algorithms are better than optimization. The accepted answer for this puzzle shows that for 1 .. N, the sum is N! - 1. More importantly, it shows that since all of the operations are commutative, the order doesn't matter so I don't have to compute every sum.
Now I divert from the question, which already has its answer. Now I merely want to make the sums to see how fast I can do it. I already show in the "Profiling" chapter how to compute factorials while saving the result from previous computations (you can see all of the book's programs on the Downloads page:
#!/usr/bin/perl
# factorial_iterate_bignum_memo.pl
use bignum;
{
my @Memo = (1);
sub factorial {
my $number = shift;
return unless int( $number ) == $number;
return $Memo[$number] if $Memo[$number];
foreach ( @Memo .. $number ) {
$Memo[$_] = $Memo[$_ - 1] * $_;
}
$Memo[ $number ];
}
}
{
print "Enter a number> ";
chomp( my $number = <STDIN> );
exit unless defined $number;
print factorial( $number ), "\n";
redo;
}
As I continued to brute-force the problem, I would have to redo quite a bit of work to compute factorials I already know the answer to, but there's no reason to do all that extra work. I modify the factorial program a little (and add some v5.10 features):
use v5.10;
sub factorial {
my $number = shift;
state $Memo = [1];
return unless int( $number ) == $number;
return $Memo->[$number] if $Memo->[$number];
foreach ( @$Memo .. $number ) {
$Memo->[$_] = $Memo->[$_ - 1] * $_;
}
$Memo->[ $number ];
}
foreach ( 1 .. $ARGV[0] ) {
say "N = $_, sum is ", factorial( $_ ) - 1;
}
Now I know the sums virtually instantly:
% time perl factorial 100 N = 1, sum is 0 N = 2, sum is 1 N = 3, sum is 5 N = 4, sum is 23 N = 5, sum is 119 N = 6, sum is 719 N = 7, sum is 5039 N = 8, sum is 40319 N = 9, sum is 362879 N = 10, sum is 3628799 ... N = 99, sum is 9.33262154439441e+155 N = 100, sum is 9.33262154439441e+157 real 0m0.007s user 0m0.003s sys 0m0.003s
Suppose now, that I did have to check every sum because the operations weren't commutative and the order mattered. How would I make everything faster? I'd do the same thing I did with the factorial example. I would save the result of previous operations, even across different-sized inputs. First, I'll comment out the meat (at line 15) and count the number of calls for each input length:
use v5.10;
use Algorithm::Permute;
my $Memo;
my %Calls;
sub make_key { join "\000", @_ }
sub in_order {
my $key = make_key( @_ );
$Calls{ scalar @_ }++;
if( exists $Memo->{$key} ) {
#return $Memo->{$key};
}
if( @_ == 2 ) {
return $Memo->{$key} = $_[0]*$_[1] + $_[0] + $_[1];
}
elsif( @_ == 1 ) {
return $_[0]
}
$Memo->{$key} = in_order(
in_order( @_[0 .. $#_-2] ),
in_order( @_[-2, -1] )
);
}
N: foreach my $n ( 2 .. $ARGV[0] ) {
my $p = Algorithm::Permute->new( [ 1 .. $n ], $n );
my $last_sum;
PERMUTE: while( my @res = $p->next ) {
my $key = make_key( @_ );
$Memo{ $key } = in_order( @res );
if( defined $last_sum and $Memo{ $key } != $last_sum ) {
say "For (@res), sum is different [$last_sum != $Memo{ $key }]";
next N;
}
$last_sum = $Memo{ $key };
}
say "N = $n: sum is $last_sum";
}
say Dumper( \%Calls ); use Data::Dumper;
This is still pretty slow, but much faster than the recursive solution in Python:
% time perl memo 10
N = 2: sum is 5
N = 3: sum is 23
N = 4: sum is 119
N = 5: sum is 719
N = 6: sum is 5039
N = 7: sum is 40319
N = 8: sum is 362879
N = 9: sum is 3628799
N = 10: sum is 39916799
$VAR1 = {
'3' => 368046,
'1' => 368046,
'6' => 3669840,
'5' => 368040,
'10' => 3628800,
'8' => 3669120,
'9' => 362880,
'4' => 3669864,
'2' => 35878886,
'7' => 367920
};
real 2m53.379s
user 2m51.615s
sys 0m1.697s
Now, I'll uncomment that line and try again. I cut the time in half, roughly, and see there are fewer calls. I strongly emphasize this in the Mastering Perl too. If I think I know the situation, I should make a change to ensure what I understand to happen actually does:
% time perl memo 10
N = 2: sum is 5
N = 3: sum is 23
N = 4: sum is 119
N = 5: sum is 719
N = 6: sum is 5039
N = 7: sum is 40319
N = 8: sum is 362879
N = 9: sum is 3628799
N = 10: sum is 39916799
$VAR1 = {
'8' => 3669120,
'2' => 12323810,
'6' => 1815120,
'5' => 181560,
'1' => 504,
'9' => 362880,
'3' => 15126,
'7' => 367920,
'10' => 3628800,
'4' => 151224
};
real 1m27.931s
user 1m26.666s
sys 0m1.251s