PDFedit Bugtracker
  

Viewing Issue Simple Details Jump to Notes ] View Advanced ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0000350 [PDFedit] =Other (Kernel)= minor always 04-23-10 08:39 05-03-10 22:50
Reporter misuj1am View Status public  
Assigned To hockm0bm
Priority normal Resolution fixed  
Status resolved   Product Version
Summary 0000350: invalid xref entries are produces if we are reusing unused object number
Description trying to open attached pdf file results in exception. adobe has no problems
Additional Information
Attached Files  xpdf_constructxref_unsigned_int.patch [^] (1,835 bytes) 04-23-10 19:53
 kazajka.pdf [^] (233,041 bytes) 04-23-10 22:00
 gen_overflow_fixes.patch [^] (975 bytes) 04-26-10 11:27
 xref_uninitialized_gen.patch [^] (1,860 bytes) 04-26-10 11:53
 pdfwriter_gen_overflow.patch [^] (1,897 bytes) 04-26-10 13:38

- Relationships
related to 0000351resolved hockm0bm Kernel XRef::constructXRef doesn't detect stream lengths correctly 

- Notes
(0001010)
hockm0bm
04-23-10 16:59

First of all, xref table is mangled:

xref
225 1
0000149153 -84215045 <<<<<<<<<<<
227 1
0000149338 00000 n
trailer

This, however, is not the critical thing because XRef::constructXRef will succeed so we have an overview about objects positions.

The reason why we fail in checkLinearized is that stream for object [2 0], which is one of the checked objects, has a bad (off-by-one actually - 117 instead of 116) length. This number is retrieved from XRef::streamEnds which is built during constructXRef.
It is hard to tell whether this number is incorrectly read or the stream really has a wrong stream length.
 
(0001011)
hockm0bm
04-23-10 17:29

I have tried to turn off XRef::getStreamEnd (by simply returning gFalse right at the beginning) to rule out bad streams length calculation (note that this code path is used _only_ for damaged documents) and checkLinearized doesn't fail then.
Nevertheless we end up in the very same situation as xpdf/kpdf that only an empty page is displayed and xpdf code complains about weir page content:

Error: Weird page contents
Error: Weird page contents

This message is printed by Gfx::display if the given object is not a stream or an array of streams. In this case we have an array and the problem is in item[8]
which is Ref [225 -842150451] which is resolved to null because off generation number mismatch (the one store in XRef::entries is 2147483647).

I am not sure whether this generation number is valid or whether there is a sign overflow bug in parser.

Anyway, where is this document from?
 
(0001012)
hockm0bm
04-23-10 18:53

OK, so this clearly doesn't comply to the specification:

"
 A non-negative integer generation number. In a newly created file, all indirect
 objects have generation numbers of 0. Nonzero generation numbers may be in-
 troduced when the file is later updated; see Sections 3.4.3, “Cross-Reference
 Table,” and 3.4.5, “Incremental Updates.”
"
 
(0001013)
hockm0bm
04-23-10 19:25

Ohh, wait a moment.

The xref table is ignored as it is considered mangled (n or f is missing at the end of line) because XRef::constructXRef doesn't care about it and tries to build xref from all existing objects in the file. And object with num=225 has 3452816845 generation number (-84215045).

constructXRef uses atoi for numbers conversion which means that 3452816845 (which doesn't fit into int type) is conversed to 0x7fffffff which is MAX_INT.

Appendix C of the PDFSpecification says that Integer value is limited to signed 32b integer number. There is no mention about generation number limit thought.

I am not sure whether this is something worth to be fixed...
 
(0001014)
hockm0bm
04-23-10 19:52

If we want to hack around that then the following patch should help.

Please note that it doesn't help for the document as is, because of already mentioned off-by-one in the stream length detection.
 
(0001015)
hockm0bm
04-23-10 20:00

Just for reference, fixed xref for the attached document looks as follows:

xref
225 1
0000149153 -842150451 n
227 1
0000149338 00000 n
trailer

Note that `1 n' is missing for object 225
 
(0001016)
misuj1am
04-23-10 20:57

this file was created by one of our tools (add_text) in win32. it means we have a problem either in kernel or xpdf
 
(0001017)
misuj1am
04-23-10 21:51

is this a valid xref?

=======
xref
0 225
0000000000 65535 f
0000123607 00000 n
 
(0001020)
misuj1am
04-24-10 19:33
edited on: 04-24-10 19:35

i hid some of the posts so nobody gets confused. the problem is that after reading the xref of that file we got something like this

        [223] {offset=22439 gen=0 type=xrefEntryUncompressed } XRefEntry
        [224] {offset=22505 gen=0 type=xrefEntryUncompressed } XRefEntry
        [225] {offset=4294967295 gen=-842150451 type=xrefEntryFree } XRefEntry
        [226] {offset=22649 gen=0 type=xrefEntryUncompressed } XRefEntry

but cxref later tries to reuse the gen but there can be anything as it is not initialized. therefore this patch solves the problem (gcc and VS in release probably initiated everything to 0 so it might have worked)

===================================================================
RCS file: /cvsroot/pdfedit/pdfedit/src/xpdf/xpdf/XRef.cc,v
retrieving revision 1.27
diff -r1.27 XRef.cc
484a485
> entries[i].gen = 0;

 
(0001021)
hockm0bm
04-26-10 11:24

The patch for XRef definitely makes sense!

I still don't see how this could happen with delinearizator, though. We do not create new objects there so that CXref::reserveRef (which is the only place where we create a new indirect reference for an object) is not called and we cannot reuse an object with uninitialized gen number.

Delinearization of the attached (kazajka.pdf) document works just fine. What I guess happened here is that you have delinearized and then edited document. Object has a hole in the indirect objects numbers (used objects 1-99, 101-130 and so the first created object ends up reusing obj. 100.
 
(0001022)
misuj1am
04-26-10 11:27

i wrongly reported that it was not caused by delinearizator, it was not. the problem was after saving changed objects
 
(0001023)
hockm0bm
04-26-10 11:28

gen_overflow_fixes.patch makes sure that we never use an overflow gen. number.
 
(0001024)
hockm0bm
04-26-10 11:33

I think that we are done with the primary cause of this issue. Thanks Jozo for your help.

Nevertheless I am still concerned about other 3 issues which popped out.
1) why do we create an invalid xref entry for an overflow entry?
2) Should XRef::constructRef take care about int overflow?
3) why XRef::constructXRef doesn't detect stream length correctly?

I guess we want to answer those question before closing this bug.
 
(0001025)
hockm0bm
04-26-10 11:49

xref_uninitialized_gen.patch contains the full fix for this issue. It is based on Jozo's patch and other places with entry initialization were added.
 
(0001026)
hockm0bm
04-26-10 13:38

> Nevertheless I am still concerned about other 3 issues which popped out.
> 1) why do we create an invalid xref entry for an overflow entry?

This is because OldStylePdfWriter::writeTrailer expects that the xref table row has 21 characters at maximum (nnnnnnnnnn ggggg n eoln) but the overflow generation number exceeds 5 characters and so `n' doesn't get in and we will end up with an invalid entry. pdfwriter_gen_overflow.patch fixes this issue
 
(0001027)
hockm0bm
04-26-10 14:26

> 2) Should XRef::constructRef take care about int overflow?

I would say that we will not break anything if we do. All correct PDFs are not affected by this change and we will be more safe for non standard PDFs.
If you agree with that Jozo, then I would commit xpdf_constructxref_unsigned_int.patch and close this issue.

> 3) why XRef::constructXRef doesn't detect stream length correctly?

This seem to be some problem in the xpdf code. I have reported it as 0000351
 
(0001030)
misuj1am
04-26-10 17:09

ACK, will test it later
 
(0001031)
hockm0bm
04-26-10 17:17

All patches are committed to the CVS.
 
(0001064)
misuj1am
05-03-10 22:50

kudos to michal, updated assigned to
 

- Issue History
Date Modified Username Field Change
04-23-10 08:39 misuj1am New Issue
04-23-10 08:39 misuj1am Status new => assigned
04-23-10 08:39 misuj1am Assigned To  => hockm0bm
04-23-10 08:39 misuj1am File Added: test.pdf-delinearised.pdf
04-23-10 16:59 hockm0bm Note Added: 0001010
04-23-10 17:29 hockm0bm Note Added: 0001011
04-23-10 18:53 hockm0bm Note Added: 0001012
04-23-10 19:25 hockm0bm Note Added: 0001013
04-23-10 19:52 hockm0bm Note Added: 0001014
04-23-10 19:53 hockm0bm File Added: xpdf_constructxref_unsigned_int.patch
04-23-10 20:00 hockm0bm Note Added: 0001015
04-23-10 20:57 misuj1am Note Added: 0001016
04-23-10 21:25 misuj1am File Deleted: test.pdf-delinearised.pdf
04-23-10 21:51 misuj1am Note Added: 0001017
04-23-10 21:59 misuj1am Note Added: 0001018
04-23-10 22:00 misuj1am File Added: kazajka.pdf
04-24-10 12:01 hockm0bm Note Added: 0001019
04-24-10 19:28 misuj1am Note View State: private: 1019
04-24-10 19:28 misuj1am Note View State: private: 1018
04-24-10 19:33 misuj1am Note Added: 0001020
04-24-10 19:35 misuj1am Note Edited: 0001020
04-26-10 11:09 hockm0bm Summary checkLinearized fails with throw MalformedFormatExeption("bad data stream") but adobe has no problems => Delinearizator produces objects with overflow gen numbers
04-26-10 11:24 hockm0bm Note Added: 0001021
04-26-10 11:26 hockm0bm Summary Delinearizator produces objects with overflow gen numbers => invalid xref entries are produces if we are reusing unused object number
04-26-10 11:27 misuj1am Note Added: 0001022
04-26-10 11:27 hockm0bm File Added: gen_overflow_fixes.patch
04-26-10 11:28 hockm0bm Note Added: 0001023
04-26-10 11:33 hockm0bm Note Added: 0001024
04-26-10 11:48 hockm0bm File Added: xref_uninitialized_gen.patch
04-26-10 11:49 hockm0bm Note Added: 0001025
04-26-10 11:53 hockm0bm File Deleted: xref_uninitialized_gen.patch
04-26-10 11:53 hockm0bm File Added: xref_uninitialized_gen.patch
04-26-10 13:38 hockm0bm File Added: pdfwriter_gen_overflow.patch
04-26-10 13:38 hockm0bm Note Added: 0001026
04-26-10 14:26 hockm0bm Note Added: 0001027
04-26-10 14:26 hockm0bm Assigned To hockm0bm => misuj1am
04-26-10 14:26 hockm0bm Status assigned => feedback
04-26-10 15:14 hockm0bm Relationship added related to 0000351
04-26-10 17:09 misuj1am Note Added: 0001030
04-26-10 17:17 hockm0bm Status feedback => resolved
04-26-10 17:17 hockm0bm Resolution open => fixed
04-26-10 17:17 hockm0bm Note Added: 0001031
05-03-10 22:50 misuj1am Assigned To misuj1am => hockm0bm
05-03-10 22:50 misuj1am Note Added: 0001064