| PDFedit | Bugtracker |
| Anonymous | Login | Signup for a new account | 09-09-2010 07:15 CEST |
| Main | My View | View Issues | Docs |
| Viewing Issue Simple Details [ Jump to Notes ] | [ View Advanced ] [ Issue History ] [ Print ] | ||||||||
| ID | Category | Severity | Reproducibility | Date Submitted | Last Update | ||||
| 0000350 | [PDFedit] =Other (Kernel)= | minor | always | 04-23-10 08:39 | 05-03-10 22:50 | ||||
| Reporter | misuj1am | View Status | public | ||||||
| Assigned To | hockm0bm | ||||||||
| Priority | normal | Resolution | fixed | ||||||
| Status | resolved | Product Version | |||||||
| Summary | 0000350: invalid xref entries are produces if we are reusing unused object number | ||||||||
| Description | trying to open attached pdf file results in exception. adobe has no problems | ||||||||
| Additional Information | |||||||||
| Attached Files |
|
||||||||
|
|
|||||||||
Relationships |
|||||||
|
|||||||
Notes |
|
|
(0001010) hockm0bm 04-23-10 16:59 |
First of all, xref table is mangled: xref 225 1 0000149153 -84215045 <<<<<<<<<<< 227 1 0000149338 00000 n trailer This, however, is not the critical thing because XRef::constructXRef will succeed so we have an overview about objects positions. The reason why we fail in checkLinearized is that stream for object [2 0], which is one of the checked objects, has a bad (off-by-one actually - 117 instead of 116) length. This number is retrieved from XRef::streamEnds which is built during constructXRef. It is hard to tell whether this number is incorrectly read or the stream really has a wrong stream length. |
|
(0001011) hockm0bm 04-23-10 17:29 |
I have tried to turn off XRef::getStreamEnd (by simply returning gFalse right at the beginning) to rule out bad streams length calculation (note that this code path is used _only_ for damaged documents) and checkLinearized doesn't fail then. Nevertheless we end up in the very same situation as xpdf/kpdf that only an empty page is displayed and xpdf code complains about weir page content: Error: Weird page contents Error: Weird page contents This message is printed by Gfx::display if the given object is not a stream or an array of streams. In this case we have an array and the problem is in item[8] which is Ref [225 -842150451] which is resolved to null because off generation number mismatch (the one store in XRef::entries is 2147483647). I am not sure whether this generation number is valid or whether there is a sign overflow bug in parser. Anyway, where is this document from? |
|
(0001012) hockm0bm 04-23-10 18:53 |
OK, so this clearly doesn't comply to the specification: " A non-negative integer generation number. In a newly created file, all indirect objects have generation numbers of 0. Nonzero generation numbers may be in- troduced when the file is later updated; see Sections 3.4.3, “Cross-Reference Table,” and 3.4.5, “Incremental Updates.” " |
|
(0001013) hockm0bm 04-23-10 19:25 |
Ohh, wait a moment. The xref table is ignored as it is considered mangled (n or f is missing at the end of line) because XRef::constructXRef doesn't care about it and tries to build xref from all existing objects in the file. And object with num=225 has 3452816845 generation number (-84215045). constructXRef uses atoi for numbers conversion which means that 3452816845 (which doesn't fit into int type) is conversed to 0x7fffffff which is MAX_INT. Appendix C of the PDFSpecification says that Integer value is limited to signed 32b integer number. There is no mention about generation number limit thought. I am not sure whether this is something worth to be fixed... |
|
(0001014) hockm0bm 04-23-10 19:52 |
If we want to hack around that then the following patch should help. Please note that it doesn't help for the document as is, because of already mentioned off-by-one in the stream length detection. |
|
(0001015) hockm0bm 04-23-10 20:00 |
Just for reference, fixed xref for the attached document looks as follows: xref 225 1 0000149153 -842150451 n 227 1 0000149338 00000 n trailer Note that `1 n' is missing for object 225 |
|
(0001016) misuj1am 04-23-10 20:57 |
this file was created by one of our tools (add_text) in win32. it means we have a problem either in kernel or xpdf |
|
(0001017) misuj1am 04-23-10 21:51 |
is this a valid xref? ======= xref 0 225 0000000000 65535 f 0000123607 00000 n |
|
(0001020) misuj1am 04-24-10 19:33 edited on: 04-24-10 19:35 |
i hid some of the posts so nobody gets confused. the problem is that after reading the xref of that file we got something like this [223] {offset=22439 gen=0 type=xrefEntryUncompressed } XRefEntry [224] {offset=22505 gen=0 type=xrefEntryUncompressed } XRefEntry [225] {offset=4294967295 gen=-842150451 type=xrefEntryFree } XRefEntry [226] {offset=22649 gen=0 type=xrefEntryUncompressed } XRefEntry but cxref later tries to reuse the gen but there can be anything as it is not initialized. therefore this patch solves the problem (gcc and VS in release probably initiated everything to 0 so it might have worked) =================================================================== RCS file: /cvsroot/pdfedit/pdfedit/src/xpdf/xpdf/XRef.cc,v retrieving revision 1.27 diff -r1.27 XRef.cc 484a485 > entries[i].gen = 0; |
|
(0001021) hockm0bm 04-26-10 11:24 |
The patch for XRef definitely makes sense! I still don't see how this could happen with delinearizator, though. We do not create new objects there so that CXref::reserveRef (which is the only place where we create a new indirect reference for an object) is not called and we cannot reuse an object with uninitialized gen number. Delinearization of the attached (kazajka.pdf) document works just fine. What I guess happened here is that you have delinearized and then edited document. Object has a hole in the indirect objects numbers (used objects 1-99, 101-130 and so the first created object ends up reusing obj. 100. |
|
(0001022) misuj1am 04-26-10 11:27 |
i wrongly reported that it was not caused by delinearizator, it was not. the problem was after saving changed objects |
|
(0001023) hockm0bm 04-26-10 11:28 |
gen_overflow_fixes.patch makes sure that we never use an overflow gen. number. |
|
(0001024) hockm0bm 04-26-10 11:33 |
I think that we are done with the primary cause of this issue. Thanks Jozo for your help. Nevertheless I am still concerned about other 3 issues which popped out. 1) why do we create an invalid xref entry for an overflow entry? 2) Should XRef::constructRef take care about int overflow? 3) why XRef::constructXRef doesn't detect stream length correctly? I guess we want to answer those question before closing this bug. |
|
(0001025) hockm0bm 04-26-10 11:49 |
xref_uninitialized_gen.patch contains the full fix for this issue. It is based on Jozo's patch and other places with entry initialization were added. |
|
(0001026) hockm0bm 04-26-10 13:38 |
> Nevertheless I am still concerned about other 3 issues which popped out. > 1) why do we create an invalid xref entry for an overflow entry? This is because OldStylePdfWriter::writeTrailer expects that the xref table row has 21 characters at maximum (nnnnnnnnnn ggggg n eoln) but the overflow generation number exceeds 5 characters and so `n' doesn't get in and we will end up with an invalid entry. pdfwriter_gen_overflow.patch fixes this issue |
|
(0001027) hockm0bm 04-26-10 14:26 |
> 2) Should XRef::constructRef take care about int overflow? I would say that we will not break anything if we do. All correct PDFs are not affected by this change and we will be more safe for non standard PDFs. If you agree with that Jozo, then I would commit xpdf_constructxref_unsigned_int.patch and close this issue. > 3) why XRef::constructXRef doesn't detect stream length correctly? This seem to be some problem in the xpdf code. I have reported it as 0000351 |
|
(0001030) misuj1am 04-26-10 17:09 |
ACK, will test it later |
|
(0001031) hockm0bm 04-26-10 17:17 |
All patches are committed to the CVS. |
|
(0001064) misuj1am 05-03-10 22:50 |
kudos to michal, updated assigned to |