The Data::Dumper stack smash (fixed)

Problems with data serializers was a major change to Mastering Perl. The Storable issue with malformed inputs was known for a long time but nobody much cared about it. Now it’s Data::Dumper‘s turn.

CVE-2014-4330 uncovered a problem for very deeply nested data structures. Data::Dumper can’t handle it past a certain point and perl gives up.

Here’s a program that builds up a nested series of array references based on the number I supply on the command line:

use Data::Dumper;
use v5.10;

# CVE-2014-4330

say STDERR 'Data::Dumper version is ', Data::Dumper->VERSION;

my $ref_1 = [ [] ];

foreach ( 1 .. $ARGV[0] ) {
	$ref_2 = [ $ref_1, 'a' .. 'z' ];
	$ref_1 = $ref_2;
	}

say STDERR 'Made the data structure';

eval { print Dumper( $ref_1 ) } or warn $@;

say STDERR 'Got to the end';

If I use 10,000, the program completes even though it takes awhile:

$ perl5.20.0 data.pl 10000 >> /dev/null
Data::Dumper version is 2.151
Made the data structure
Got to the end

If I use 100,000, the program creates the data structure but almost immediately fails after that:

$ perl5.20.0 data.pl 100000 >> /dev/null
Data::Dumper version is 2.151
Made the data structure
Segmentation fault: 11

The eval can’t catch problems inside perl, such as perl trying to access memory it shouldn’t be able to. It can’t catch the segfault.

Steffen Müller fixed this pretty quickly by adding a default recursion limit of 1,000. Data::Dumper 2.154 is on CPAN now. If I use that, my program still isn’t successful, but the eval can catch the deep recursion:

$ perl5.20.0 -I../Data-Dumper-2.154/blib/lib data.pl 100000 >> /dev/null
Data::Dumper version is 2.154
Made the data structure
Recursion limit of 1000 exceeded at /Users/brian/Downloads/Data-Dumper-2.154/blib/lib/Data/Dumper.pm line 358.
Got to the end

I’m curious if there are real cases (I’ve heard some imagined ones described) where this would be a problem. On by big Mac Pro tower loaded to the gills with RAM, I can only get to 13,785 levels of nested arrays. I hate when Perl has these limits! Curiously, my puny MacBook Air can handle 14,966 levels.

Data::Dump::Streamer has different problems with these deep data structures. I tried it with the same basic program:

use Data::Dump::Streamer qw(Dump);
use v5.10;

say STDERR 'Data::Dumper version is ', Data::Dump::Streamer->VERSION;

my $ref_1 = [ [] ];

foreach ( 1 .. $ARGV[0] ) {
	$ref_2 = [ $ref_1 ];
	$ref_1 = $ref_2;
	}

say STDERR 'Made the data structure';

eval { print Dump( $ref_1 ) } or warn $@;

say STDERR 'Got to the end';

When I try this with 100,000 levels, I get deep recursion warnings and the program appears to hang and the process size grew to about 13 Gb, fell to 10 Gb, then stayed there for several hours. This is a problem too: if someone can cause this program to run several times, they can hog all the memory on the machine so other processes can’t do their thing.

$ perl5.20.0 data-dump-streamer.pl 100000
Data::Dumper version is 2.38
Made the data structure
Deep recursion on subroutine "Data::Dump::Streamer::_dump_sv" at /usr/local/perls/perl-5.20.0/lib/site_perl/5.20.0/darwin-2level/Data/Dump/Streamer.pm line 2676.
Deep recursion on subroutine "Data::Dump::Streamer::_dump_rv" at /usr/local/perls/perl-5.20.0/lib/site_perl/5.20.0/darwin-2level/Data/Dump/Streamer.pm line 2383.
Deep recursion on subroutine "Data::Dump::Streamer::_dump_array" at /usr/local/perls/perl-5.20.0/lib/site_perl/5.20.0/darwin-2level/Data/Dump/Streamer.pm line 2991.

Here’s a JSON::XS version of the program:

use JSON::XS qw(encode_json);
use v5.10;

say STDERR 'JSON version is ', JSON::XS->VERSION;

my $ref_1 = [ [] ];

foreach ( 1 .. $ARGV[0] ) {
	$ref_2 = [ $ref_1 ];
	$ref_1 = $ref_2;
	}

say STDERR 'Made the data structure';

eval { print encode_json( $ref_1 ) } or warn $@;

say STDERR 'Got to the end';

It catches its recursion limit and dies, but the program continues:

$ perl5.20.0 json.pl 1000 > /dev/null
JSON version is 3.01
Made the data structure
json text or perl structure exceeds maximum nesting level (max_depth set too low?) at json.pl line 15.
Got to the end

What about YAML?

use YAML qw(Dump);
use v5.10;

say STDERR 'YAML version is ', YAML->VERSION;

my $ref_1 = [ [] ];

foreach ( 1 .. $ARGV[0] ) {
	$ref_2 = [ $ref_1 ];
	$ref_1 = $ref_2;
	}

say STDERR 'Made the data structure';

eval { print Dump( $ref_1 ) } or warn $@;

say STDERR 'Got to the end';

YAML catches the recursion too:

$ perl5.20.0 yaml.pl 1000 > /dev/null
YAML version is 1.11
Made the data structure
Deep recursion on subroutine "YAML::Dumper::_prewalk" at /usr/local/perls/perl-5.20.0/lib/site_perl/5.20.0/YAML/Dumper.pm line 187.
Deep recursion on subroutine "YAML::Dumper::_emit_node" at /usr/local/perls/perl-5.20.0/lib/site_perl/5.20.0/YAML/Dumper.pm line 385.
Deep recursion on subroutine "YAML::Dumper::_emit_sequence" at /usr/local/perls/perl-5.20.0/lib/site_perl/5.20.0/YAML/Dumper.pm line 272.
Got to the end

Catching the recursion isn’t much better for the task even if it protects the resources. But, that’s how it is.