You are on page 1of 26

% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %

Copyright (C) 1991, 1995, 1996, 1998, 1999 Aladdin Enterprises. All rights reserved. This software is provided AS-IS with no warranty, either express or implied. This software is distributed under license and may not be copied, modified or distributed except as expressly authorized under the terms of the license contained in the file LICENSE in this distribution. For more information about licensing, please refer to http://www.ghostscript.com/licensing/. For information on commercial licensing, go to http://www.artifex.com/licensing/ or contact Artifex Software, Inc., 101 Lucas Valley Road #110, San Rafael, CA 94903, U.S.A., +1(415)492-9861. $Id$ Extract the ASCII text from a PostScript file. Nothing is displayed. Instead, ASCII information is written to stdout. The idea is similar to Glenn Reid's `distillery', only a lot more simple-minded, and less robust. If SIMPLE is defined, just the text is written, with a guess at line breaks and word spacing. If SIMPLE is not defined, lines are written to stdout as follows: F <height> <width> (<fontname>) Indicate the font height and the width of a space. P Indicate the end of the page. S <x> <y> (<string>) <width> Display a string. <width> and <height> are integer dimensions in units of 1/720". <x> and <y> are integer coordinates, in units of 1/720", with the origin at the lower left. <string> and <fontname> are strings represented with the standard PostScript escape conventions. If COMPLEX is defined, the following additional types of lines are written to stdout. C <r> <g> <b> Indicate the current color. I <x> <y> <width> <height> Note the presence of an image. R <x> <y> <width> <height> Fill a rectangle. <r>, <g>, and <b> are RGB values expressed as integers between 0 and 1000. Note that future versions of this program (in COMPLEX mode) may add other output elements, so programs parsing the output should be prepared to ignore elements that they do not recognize.

% Note that this code will only work in all cases if systemdict is writable % and if `binding' the definitions of operators defined as procedures

% is deferred. For this reason, it is normally invoked with % gs -q -dNODISPLAY -dDELAYBIND -dWRITESYSTEMDICT ps2ascii.ps % Thanks to: % J Greely <jgreely@cis.ohio-state.edu> for improvements to this code; % Jerry Whelan <jerryw@abode.ccd.bnl.gov> for motivating other improvements; % David M. Jones <dmjones@theory.lcs.mit.edu> for improvements noted below. %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% %% Additional modifications by David M. Jones (dmjones@theory.lcs.mit.edu), December 23, 1997 (a) Rewrote forall loop at the end of .show.write. This fixes a stack leakage problem, but the changes are more significant than that. .char.map includes the names of all characters in the StandardEncoding, ISOLatin1Encoding, OT1Encoding and T1Encoding vectors. Thus, if the Encoding vector for the current font contains a name that is not in .char.map, it's redundant to check if the Encoding vector is equal to one of the known vectors. Previous versions of ps2ascii would give up at this point, and substitute an asterisk (*) for the character. I've taken the liberty of instead using the OT1Encoding vector to translate the character, on the grounds that in the cases I'm most interested in, a font without a useful Encoding vector was most likely created by a DVI to PS converter such as dvips or DVILASER (and OT1Encoding is largely compatible with StandardEncoding anyway). [Note that this does not make my earlier changes to support dvips (see fix (a) under my 1996 changes) completely obsolete, since there's additional useful information I can extract in that case.] Overall, this should provide better support for some documents (e.g, DVILASER documents will no longer be translated into a series of *'s) without breaking any other documents any worse than they already were broken. (b) Fixed two bugs in dvips.df-tail: (1) changed "dup 127" to "dup 128" to fix fencepost error, and (2) gave each font it's own FontName rather than having all fonts share the same name. (c) Added one further refinement to the heuristic for detecting paragraph breaks: do not ever start a new paragraph after a line ending in a hyphen. (d) Added a bunch of missing letters from the T1Encoding, OT1Encoding and ISOLatin1Encoding vectors to .letter.chars to improve hyphen-elimination algorithm. This still won't help if there's no useful Encoding vector. NOTE: A better solution to the problem of missing Encoding vectors might be to redefine definefont to check whether the Encoding vector is sensible and, if not, replace it by a default. This would alleviate the need for constant tests in the .show.write loop, as well as automatically solving the problem noted in fix (d) above, and the similar problem with .break.chars. This should be investigated. Also, the hyphen-elimination algorithm really needs to be looked at carefully and rethought.

%%* Modifications to ps2ascii.ps by David M. Jones %%* (dmjones@theory.lcs.mit.edu), June 25-July 8, 1996 %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* Modifications: (a) added code to give better support for dvips files by providing FontBBox's, FontName's and Encoding vectors for downloaded bitmap fonts. This is done by using dvips's start-hook to overwrite the df-tail and D procedures that dvips uses to define its Type 3 bitmap fonts. Thus, this change should provide better support for dvips-generated PS files without affecting the handling of other documents. (b) Fixed two bugs that could potentially affect any PS file, not just those created by dvips: (1) added missing "get" operator in .show.write and (2) fixed bug that caused a hyphen at the end of a line to be replaced by a space rather than begin deleted. Note that the first bug was a source of stack leakage, causing ps2ascii to run out of operand stack space occasionally. Search for "%%* BF" to find these modifications. (c) Improved the heuristic for determining whether a line break has occurred and whether a line break represents a paragraph break. Previously, any change in the vertical position caused a line break; now a line break is only registered if the change is larger than the height of the current font. This means that superscripts, subscripts, and such things as shifted accents generated by TeX won't cause line breaks. Paragraph-recognition is now done by comparing the indentation of the new line to the indentation of the previous line and by comparing the vertical distance between the new line and the previous line to the vertical distance between the previous line and its predecessor. (d) Added a hook for renaming the files where stdout and stderr go. In general, my additions or changes to the code are described in comments beginning with "%%*". However, there are numerous other places where I have either re-formatted code or added comments to the code while I was trying to understand it. These are usually not specially marked.

/QUIET true def systemdict wcheck { systemdict } { userdict } ifelse begin /.max where { pop } { /.max { 2 copy lt { exch } if pop } bind def } ifelse /COMPLEX dup where { pop true } { false } ifelse def /SIMPLE dup where { pop true } { false } ifelse def /setglobal where { pop currentglobal /setglobal load true setglobal } { { } } ifelse % Define a way to store and retrieve integers that survives save/restore. /.i.string0 (0 ) def /.i.string .i.string0 length string def /.iget { cvi } bind def

/.iput { exch //.i.string exch copy cvs pop } bind def /.inew { //.i.string0 dup length string copy } bind def % We only want to redefine operators if they are defined already. /codef { 1 index where { pop def } { pop pop } ifelse } def % Redefine the end-of-page operators. /erasepage { } codef /copypage { SIMPLE { (\014) } { (P\n) } ifelse //print } codef /showpage { copypage erasepage initgraphics } codef % Redefine the fill operators to detect rectangles. /.orderrect % <llx> <lly> <urx> <ury> .orderrect <llx> <lly> <w> <h> { % Ensure llx <= urx, lly <= ury. 1 index 4 index lt { 4 2 roll } if dup 3 index lt { 3 1 roll exch } if exch 3 index sub exch 2 index sub } odef /.fillcomplex { % Do a first pass to see if the path is all rectangles in % the output coordinate system. We don't worry about overlapping % rectangles that might be partially not filled. % Stack: mark llx0 lly0 urx0 ury0 ... true mark x0 y0 ... mark true mark % Add a final moveto so we pick up any trailing unclosed subpath. 0 0 itransform moveto { .coord counttomark 2 gt { counttomark 4 gt { .fillcheckrect } { 4 2 roll pop pop } ifelse } if } { .coord } { cleartomark not mark exit } { counttomark -2 roll 2 copy counttomark 2 roll .fillcheckrect } pathforall cleartomark { .showcolor counttomark 4 idiv { counttomark -4 roll .orderrect (R ) //print .show==4 } repeat pop } { cleartomark } ifelse } odef /.fillcheckrect { % Check whether the current subpath is a rectangle. % If it is, add it to the list of rectangles being accumulated; % if not exit the .fillcomplex loop. % The subpath has not been closed. % Stack: as in .fillcomplex, + newx newy counttomark 10 eq { 9 index 9 index 4 2 roll } if counttomark 12 ne { cleartomark not mark exit } if 12 2 roll % Check for the two possible forms of rectangles: % x0 y0 x0 y1 x1 y1 x1 y0 x0 y0 % x0 y0 x1 y0 x1 y1 x0 y1 x0 y0 9 index 2 index eq 9 index 2 index eq and

10 index 9 index eq { % Check for first form. 7 index 6 index eq and 6 index 5 index eq and 3 index 2 index eq and } { % Check for second form. 9 index 8 index eq and 8 index 7 index eq and 5 index 4 index eq and 4 index 3 index eq and } ifelse not { cleartomark not mark exit } if % We have a rectangle. pop pop pop pop 4 2 roll pop pop 8 4 roll } odef /eofill { COMPLEX { .fillcomplex } if newpath } codef /fill { COMPLEX { .fillcomplex } if newpath } codef /rectfill { gsave newpath .rectappend fill grestore } codef /ueofill { gsave newpath uappend eofill grestore } codef /ufill { gsave newpath uappend fill grestore } codef % Redefine the stroke operators to detect rectangles. /rectstroke { gsave newpath dup type dup /arraytype eq exch /packedarraytype eq or { dup length 6 eq { exch .rectappend concat } { .rectappend } ifelse } { .rectappend } ifelse stroke grestore } codef /.strokeline % <fromx> <fromy> <tox> <toy> .strokeline <tox> <toy> % Note: fromx and fromy are in output coordinates; % tox and toy are in user coordinates. { .coord 2 copy 6 2 roll .orderrect % Add in the line width. Assume square or round caps. currentlinewidth 2 div dup .dcoord add abs 1 .max 5 1 roll 4 index add 4 1 roll 4 index add 4 1 roll 4 index sub 4 1 roll 5 -1 roll sub 4 1 roll (R ) //print .show==4 } odef /.strokecomplex { % Do a first pass to see if the path is all horizontal and vertical % lines in the output coordinate system. % Stack: true mark origx origy curx cury true mark null null null null { .coord 6 2 roll pop pop pop pop 2 copy } { .coord 1 index 4 index eq 1 index 4 index eq or { 4 2 roll pop pop } { cleartomark not mark exit } ifelse } { cleartomark not mark exit } { counttomark -2 roll 2 copy counttomark 2 roll 1 index 4 index eq 1 index 4 index eq or { pop pop 2 copy } { cleartomark not mark exit } ifelse } pathforall cleartomark 0 currentlinewidth .dcoord 0 eq exch 0 eq or and % Do the second pass to write out the rectangles. % Stack: origx origy curx cury { .showcolor null null null null

{ 6 2 roll pop pop pop pop 2 copy .coord } { .strokeline } { } { 3 index 3 index .strokeline } pathforall pop pop pop pop } if } odef /stroke { COMPLEX { .strokecomplex } if newpath } codef /ustroke { gsave newpath dup length 6 eq { exch uappend concat } { uappend } ifelse stroke grestore } codef % The image operators must read the input and note the dimensions. % Eventually we should redefine these to detect 1-bit-high all-black images, % since this is how dvips does underlining (!). /.noteimagerect % <width> <height> <matrix> .noteimagerect { COMPLEX { gsave setmatrix itransform 0 0 itransform grestore .coord 4 2 roll .coord .orderrect (I ) //print .show==4 } { pop pop pop } ifelse } odef /colorimage where { pop /colorimage { 1 index { dup 6 add index 1 index 6 add index 2 index 5 add index } { 6 index 6 index 5 index } ifelse .noteimagerect gsave nulldevice //colorimage grestore } codef } if /.noteimage % Arguments as for image[mask] { dup type /dicttype eq { dup /Width get 1 index /Height get 2 index /ImageMatrix get } { 4 index 4 index 3 index } ifelse .noteimagerect } odef /image { .noteimage gsave nulldevice //image grestore } codef /imagemask { .noteimage gsave nulldevice //imagemask grestore } codef % Output the current color if necessary. /.color.r .inew def .color.r -1 .iput % make sure we write the color at the beginning /.color.g .inew def /.color.b .inew def /.showcolor { COMPLEX { currentrgbcolor 1000 mul round cvi 3 1 roll 1000 mul round cvi exch 1000 mul round cvi % Stack: b g r dup //.color.r .iget eq 2 index //.color.g .iget eq and

3 index //.color.b .iget eq and { pop pop pop } { (C ) //print dup //.color.r exch .iput .show==only ( ) //print dup //.color.g exch .iput .show==only ( ) //print dup //.color.b exch .iput .show==only (\n) //print } ifelse } if } bind def % Redefine `show'. % Set things up so our output will be in tenths of a point, with origin at % lower left. This isolates us from the peculiarities of individual devices. /.show.ident.matrix matrix def /.show.ident { % - .show.ident <scale> <matrix> % //.show.ident.matrix defaultmatrix % % Assume the original transformation is well-behaved. % 0.1 0 2 index dtransform abs exch abs .max /.show.scale exch def % 0.1 dup 3 -1 roll scale gsave initmatrix % Assume the original transformation is well-behaved... 0.1 0 dtransform abs exch abs .max 0.1 dup scale .show.ident.matrix currentmatrix % ... but undo any rotation into landscape orientation. dup 0 get 0 eq { 1 get dup abs div 90 mul rotate .show.ident.matrix currentmatrix } if grestore } bind def /.coord { % <x> <y> .coord <x'> <y'> transform .show.ident exch pop itransform exch round cvi exch round cvi } odef /.dcoord { % <dx> <dy> .coord <dx'> <dy'> % Transforming distances is trickier, because % the coordinate system might be rotated. .show.ident pop 3 1 roll exch 0 dtransform dup mul exch dup mul add sqrt 2 index div round cvi exch 0 exch dtransform dup mul exch dup mul add sqrt 3 -1 roll div round cvi } odef % Remember the current X, Y, and height. /.show.x .inew def /.show.y .inew def /.show.height .inew def % Remember the last character of the previous string; if it was a

% hyphen preceded by a letter, we didn't output the hyphen. /.show.last (\000) def % Remember the current font. /.font.name 130 string def /.font.name.length .inew def /.font.height .inew def /.font.width .inew def %%* Also remember indentation of current line and previous vertical %%* skip /.show.indent .inew def /.show.dy .inew def % We have to redirect stdout somehow.... /.show.stdout { (%stdout) (w) file } bind def % Make sure writing will work even if a program uses =string. /.show.string =string length string def /.show.=string =string length string def /.show==only { //=string //.show.=string copy pop dup type /stringtype eq { dup length //.show.string length le { dup rcheck { //.show.string copy } if } if } if .show.stdout exch write==only //.show.=string //=string copy pop } odef /.show==4 { 4 -1 roll .show==only ( ) //print 3 -1 roll .show==only ( ) //print exch .show==only ( ) //print .show==only (\n) //print } odef /.showwidth % Same as stringwidth, but disable COMPLEX so that % we don't try to detect rectangles during BuildChar.

{ COMPLEX { /COMPLEX false def stringwidth /COMPLEX true def } { stringwidth } ifelse } odef /.showfont % <string> .showfont <string> { gsave % Try getting the height and width of the font from the FontBBox. currentfont /FontBBox .knownget not { {0 0 0 0} } if aload pop % llx lly urx ury exch 4 -1 roll % lly ury urx llx sub % lly ury dx 3 1 roll exch % dx ury lly sub % dx dy 2 copy .max 0 ne { currentfont /FontMatrix get dtransform }

{ pop pop % Fonts produced by dvips, among other applications, have % BuildChar procedures that bomb out when given unexpected % characters, and there is no way to determine whether a given % character will do this. So for Type 1 fonts, we measure a % typical character ('X'); for others, we punt. currentfont /FontType get 1 eq { (X) .showwidth pop dup 1.3 mul } { % No safe way to get the character size. Punt. 0 0 } ifelse } ifelse .dcoord exch currentfont /FontName .knownget not { () } if dup type /stringtype ne { //.show.string cvs } if grestore % Stack: height width fontname SIMPLE { pop pop //.show.height exch .iput } { 2 index //.font.height .iget eq 2 index //.font.width .iget eq and 1 index //.font.name 0 //.font.name.length .iget getinterval eq and { pop pop pop } { (F ) //print 3 -1 roll dup //.font.height exch .iput .show==only ( ) //print exch dup //.font.width exch .iput .show==only ( ) //print dup length //.font.name.length exch .iput //.font.name cvs .show==only (\n) //print } ifelse } ifelse } odef % Define the letters -- characters which, if they occur followed by a hyphen % at the end of a line, cause the hyphen and line break to be ignored. /.letter.chars 100 dict def mark 65 1 90 { dup 32 add } for counttomark { StandardEncoding exch get .letter.chars exch dup put } repeat pop %%* Add the rest of the letters from the [O]T1Encoding and %%* ISOLatin1Encoding vectors mark /AE /Aacute /Abreve /Acircumflex /Adieresis /Agrave /Aogonek /Aring /Atilde

/Cacute /Ccaron /Ccedilla /Dcaron /Eacute /Ecaron /Ecircumflex /Edieresis /Egrave /Eng /Eogonek /Eth /Gbreve /Germandbls /IJ /Iacute /Icircumflex /Idieresis /Idot /Igrave /Lacute /Lcaron /Lslash /Nacute /Ncaron /Ntilde /OE /Oacute /Ocircumflex /Odieresis /Ograve /Ohungarumlaut /Oslash /Otilde /Racute /Rcaron /Sacute /Scaron /Scedilla /Tcaron /Tcedilla /Thorn /Uacute /Ucircumflex /Udieresis /Ugrave /Uhungarumlaut /Uring /Yacute /Ydieresis /Zacute /Zcaron /Zdot /aacute /abreve /acircumflex /adieresis /ae /agrave /aogonek

/aring /atilde /cacute /ccaron /ccedilla /dbar /dcaron /dotlessi /dotlessj /eacute /ecaron /ecircumflex /edieresis /egrave /eng /eogonek /eth /exclamdown /ff /ffi /ffl /fi /fl /gbreve /germandbls /iacute /icircumflex /idieresis /igrave /ij /lacute /lcaron /lslash /nacute /ncaron /ntilde /oacute /ocircumflex /odieresis /oe /ograve /ohungarumlaut /oslash /otilde /questiondown /racute /rcaron /sacute /scaron /scedilla /section /sterling /tcaron /tcedilla /thorn /uacute /ucircumflex /udieresis /ugrave /uhungarumlaut

/uring /yacute /ydieresis /zacute /zcaron /zdot counttomark { .letter.chars exch dup put } repeat pop % Define a set of characters which, if they occur at the start of a line, % are taken as indicating a paragraph break. /.break.chars 50 dict def mark /bullet /dagger /daggerdbl /periodcentered /section counttomark { .break.chars exch dup put } repeat pop % Define character translation to ASCII. % We have to do this for the entire character set. /.char.map 500 dict def /.chars.def { counttomark 2 idiv { .char.map 3 1 roll put } repeat pop } def % Encode the printable ASCII characters. mark 32 1 126 { 1 string dup 0 4 -1 roll put dup 0 get StandardEncoding exch get exch } for .chars.def % Encode accents. mark /acute /caron /cedilla /circumflex /dieresis /grave /ring /tilde .chars.def (') (^) (,) (^) (") (`) (*) (~)

% Encode the ISO accented characters. mark 192 1 255 { ISOLatin1Encoding exch get =string cvs dup 0 1 getinterval 1 index dup length 1 sub 1 exch getinterval .char.map 2 index known .char.map 2 index known and { .char.map 3 -1 roll get .char.map 3 -1 roll get concatstrings .char.map 3 1 roll put } { pop pop pop } ifelse }

for .chars.def % Encode the remaining standard and ISO alphabetic characters. mark /AE (AE) /Eth (DH) /OE (OE) /Thorn (Th) /ae (ae) /eth (dh) /ffi (ffi) /ffl (ffl) /fi (fi) /fl (fl) /germandbls (ss) /oe (oe) /thorn (th) .chars.def % Encode the other standard and ISO characters. mark /brokenbar (|) /bullet (*) /copyright ((C)) /currency (#) /dagger (#) /daggerdbl (##) /degree (o) /divide (/) /dotaccent (.) /dotlessi (i) /ellipsis (...) /emdash (--) /endash (-) /exclamdown (!) /florin (f) /fraction (/) /guillemotleft (<<) /guillemotright (>>) /guilsinglleft (<) /guilsinglright (>) /hungarumlaut ("") /logicalnot (~) /macron (_) /minus (-) /mu (u) /multiply (*) /ogonek (,) /onehalf (1/2) /onequarter (1/4) /onesuperior (1) /ordfeminine (-a) /ordmasculine (-o) /paragraph (||) /periodcentered (*) /perthousand (o/oo) /plusminus (+-) /questiondown (?) /quotedblbase (") /quotedblleft (") /quotedblright (") /quotesinglbase (,) /quotesingle (') /registered ((R)) /section ($) /sterling (#) /threequarters (3/4) /threesuperior (3) /trademark ((TM)) /twosuperior (2) /yen (Y) .chars.def % Encode a few common Symbol characters. mark /asteriskmath (*) /copyrightsans ((C)) /copyrightserif ((C)) /greaterequal (>=) /lessequal (<=) /registersans ((R)) /registerserif ((R)) /trademarksans ((TM)) /trademarkserif ((TM)) .chars.def %%* Add a few characters from StandardEncoding and ISOLatin1Encoding %%* that were missing. mark /cent /guilsinglleft /guilsinglright /breve /Lslash /lslash .chars.def %%* %%* %%* %%* %%* %%* %%* %%* (c) (<) (>) (*) (L/) (l/)

Define the OT1Encoding and T1Encoding vectors for use with dvips files. Unfortunately, there's no way of telling what font is really being used within a dvips document, so we can't provide an appropriate encoding for each individual font. Instead, we'll just provide support for the two most popular text encodings, the OT1 and T1 encodings, and just accept the fact that any font not using one of those encodings will be rendered as gibberish.

%%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%* %%*

OT1 is Knuth's 7-bit encoding for the CMR text fonts, while T1 (aka the Cork encoding) is the 8-bit encoding used by the DC fonts, a preliminary version of the proposed Extended Computer Modern fonts. Unfortunately, T1 is not a strict extension of OT1; they differ in positions 8#000 through 8#040, 8#074, 8#076, 8#134, 8#137, 8#173, 8#174, 8#175 and 8#177, so we can't use the same vector for both. Of course, we also can't reliably tell the difference between an OT1-encoded font and a T1-encoded font based on the information in a dvips-created PostScript file. As a best-guess solution, we'll use the T1 encoding if the font contains any characters in positions above 8#177 and the OT1 encoding if it doesn't.

/T1Encoding 256 array def /OT1Encoding 256 array def %%* T1Encoding shares a lot with StandardEncoding, so let's start %%* there. StandardEncoding T1Encoding copy pop /OT1.encode { counttomark 2 idiv { OT1Encoding 3 1 roll put } repeat cleartomark } def /T1.encode { counttomark 2 idiv { T1Encoding 3 1 roll put } repeat cleartomark } def mark 8#000 8#001 8#002 8#003 8#004 8#005 8#006 8#007 8#010 8#011 8#012 8#013 8#014 8#015 8#016 8#017 /grave /acute /circumflex /tilde /dieresis /hungarumlaut /ring /caron /breve /macron /dotaccent /cedilla /ogonek /quotesinglbase /guilsinglleft /guilsinglright

8#020 /quotedblleft 8#021 /quotedblright

8#022 8#023 8#024 8#025 8#026 8#027 8#030 8#031 8#032 8#033 8#034 8#035 8#036 8#037

/quotedblbase /guillemotleft /guillemotright /endash /emdash /cwm /perthousandzero /dotlessi /dotlessj /ff /fi /fl /ffi /ffl

%% 8#040 through 8#176 follow StandardEncoding 8#177 /hyphen T1.encode mark 8#200 8#201 8#202 8#203 8#204 8#205 8#206 8#207 8#210 8#211 8#212 8#213 8#214 8#215 8#216 8#217 8#220 8#221 8#222 8#223 8#224 8#225 8#226 8#227 8#230 8#231 8#232 8#233 8#234 8#235 8#236 8#237 8#240 8#241 8#242 8#243 8#244 8#245 /Abreve /Aogonek /Cacute /Ccaron /Dcaron /Ecaron /Eogonek /Gbreve /Lacute /Lcaron /Lslash /Nacute /Ncaron /Eng /Ohungarumlaut /Racute /Rcaron /Sacute /Scaron /Scedilla /Tcaron /Tcedilla /Uhungarumlaut /Uring /Ydieresis /Zacute /Zcaron /Zdot /IJ /Idot /dbar /section /abreve /aogonek /cacute /ccaron /dcaron /ecaron

8#246 8#247 8#250 8#251 8#252 8#253 8#254 8#255 8#256 8#257 8#260 8#261 8#262 8#263 8#264 8#265 8#266 8#267 8#270 8#271 8#272 8#273 8#274 8#275 8#276 8#277 8#300 8#301 8#302 8#303 8#304 8#305 8#306 8#307 8#310 8#311 8#312 8#313 8#314 8#315 8#316 8#317 8#320 8#321 8#322 8#323 8#324 8#325 8#326 8#327 8#330 8#331 8#332 8#333 8#334 8#335 8#336 8#337

/eogonek /gbreve /lacute /lcaron /lslash /nacute /ncaron /eng /ohungarumlaut /racute /rcaron /sacute /scaron /scedilla /tcaron /tcedilla /uhungarumlaut /uring /ydieresis /zacute /zcaron /zdot /ij /exclamdown /questiondown /sterling /Agrave /Aacute /Acircumflex /Atilde /Adieresis /Aring /AE /Ccedilla /Egrave /Eacute /Ecircumflex /Edieresis /Igrave /Iacute /Icircumflex /Idieresis /Eth /Ntilde /Ograve /Oacute /Ocircumflex /Otilde /Odieresis /OE /Oslash /Ugrave /Uacute /Ucircumflex /Udieresis /Yacute /Thorn /Germandbls

8#340 8#341 8#342 8#343 8#344 8#345 8#346 8#347 8#350 8#351 8#352 8#353 8#354 8#355 8#356 8#357 8#360 8#361 8#362 8#363 8#364 8#365 8#366 8#367 8#370 8#371 8#372 8#373 8#374 8#375 8#376 8#377 T1.encode

/agrave /aacute /acircumflex /atilde /adieresis /aring /ae /ccedilla /egrave /eacute /ecircumflex /edieresis /igrave /iacute /icircumflex /idieresis /eth /ntilde /ograve /oacute /ocircumflex /otilde /odieresis /oe /oslash /ugrave /uacute /ucircumflex /udieresis /yacute /thorn /germandbls

%%* Now copy OT1Encoding into T1Encoding and make a few changes. T1Encoding OT1Encoding copy pop mark 8#000 8#001 8#002 8#003 8#004 8#005 8#006 8#007 8#010 8#011 8#012 8#013 8#014 8#015 8#016 8#017 /Gamma /Delta /Theta /Lambda /Xi /Pi /Sigma /Upsilon /Phi /Psi /Omega /ff /fi /fl /ffi /ffl

8#020 /dotlessi 8#021 /dotlessj

8#022 8#023 8#024 8#025 8#026 8#027 8#030 8#031 8#032 8#033 8#034 8#035 8#036 8#037

/grave /acute /caron /breve /macron /ring /cedilla /germandbls /ae /oe /oslash /AE /OE /Oslash

8#040 /polishslash 8#042 /quotedblright 8#074 /exclamdown 8#076 /questiondown 8#134 /quotedblleft 8#137 /dotaccent 8#173 /endash 8#174 /emdash 8#175 /hungarumlaut 8#177 /dieresis OT1.encode %%* And add a few characters from the OT1Encoding mark /Gamma /Delta /Theta /Lambda /Xi /Pi /Sigma /Upsilon /Phi /Psi /Omega /dotlessj /ff /cwm /perthousandzero /polishslash /Abreve /Aogonek /Cacute (\\Gamma ) (\\Delta ) (\\Theta ) (\\Lambda ) (\\Xi ) (\\Pi ) (\\Sigma ) (\\Upsilon ) (\\Phi ) (\\Psi ) (\\Omega ) (j) (ff) () (0) () (A*) (A,) (C')

/Ccaron /Dcaron /Ecaron /Eogonek /Gbreve /Lacute /Lcaron /Nacute /Ncaron /Eng /Ohungarumlaut /Racute /Rcaron /Sacute /Scaron /Scedilla /Tcaron /Tcedilla /Uhungarumlaut /Uring /Ydieresis /Zacute /Zcaron /Zdot /IJ /Idot /dbar /abreve /aogonek /cacute /ccaron /dcaron /ecaron /eogonek /gbreve /lacute /lcaron /nacute /ncaron /eng /ohungarumlaut /racute /rcaron /sacute /scaron /scedilla /tcaron /tcedilla /uhungarumlaut /uring /zacute /zcaron /zdot /ij /Germandbls .chars.def

(C^) (D^) (E^) (E,) (G*) (L') (L^) (N') (N^) (NG) (O"") (R') (R^) (S') (S^) (S,) (T^) (T,) (U"") (U*) (Y") (Z') (Z^) (Z.) (IJ) (I.) (d-) (a*) (a,) (c') (c^) (d^) (e^) (e,) (g*) (l') (l^) (n') (n^) (ng) (o"") (r') (r^) (s') (s^) (s,) (t^) (t,) (u"") (u*) (z') (z^) (z.) (ij) (SS)

%%* We extend the df-tail command to stick in an Encoding vector (see %%* above for a discussion of the T1 and OT1 encodings), put in a %%* FontName (which will just be dvips's name for the font, i.e., Fa,

%%* Fb, etc.) and give each font a separate FontBBox instead of %%* letting them all share a single one. /dvips.df-tail % id numcc maxcc df-tail { /nn 9 dict N nn begin %% %% Choose an encoding based on the highest position occupied. %% dup 128 gt { T1Encoding } { OT1Encoding } ifelse /Encoding X /FontType 3 N %% %% It's ok for all the fonts to share a FontMatrix, but they %% need to have separate FontBBoxes %% /FontMatrix fntrx N /FontBBox [0 0 0 0] N string /base X array /BitMaps X %% %% And let's throw in a FontName for good measure %% dup ( ) cvs %% %% Make sure each font gets it own private FontName. -- dmj, %% 12/23/97 %% dup length string copy /FontName X /BuildChar {CharBuilder} N end dup { /foo setfont } 2 array copy cvx N load 0 nn put /ctr 0 N [ } def %%* This is functionally equivalent to dvips's /D procedure, but it %%* also calculates the Font Bounding Box while defining the %%* characters. /dvips.D % char-data ch D - : define character bitmap in current font { /cc X % char-data dup type /stringtype ne {]} if % char-data /ch-xoff where { pop } { dup /Cd exch def /ch-width { Cw } def /ch-height { Ch } def /ch-xoff { Cx } def /ch-yoff { Cy } def /ch-dx { Cdx } def } ifelse /ch-data X

nn /base get cc ctr put % nn /BitMaps get ctr ch-data % sf 1 ne { dup dup length 1 sub dup 2 } if put % /ctr ctr 1 add N

(adds ctr to cc'th position of BASE) BitMaps ctr char-data index S get sf div put puts char-data into BitMaps at index ctr

%% %% Make sure the Font Bounding Box encloses the Bounding Box of the %% current character %% nn /FontBBox get % BB dup dup 0 get ch-xoff .min 0 exch put dup dup 1 get ch-yoff ch-height sub .min 1 exch put dup dup 2 get ch-dx ch-width add .max 2 exch put dup 3 get ch-yoff .max 3 exch put } def %%* Define start-hook to replace df-tail and D by our versions. %%* Unfortunately, the user can redefine start-hook and thus bypass %%* these changes, but I don't see an obvious way around that. userdict /start-hook { TeXDict /df-tail /dvips.df-tail load bind put TeXDict /D /dvips.D load bind put } put %%* Introduce a symbolic constant for hyphens. (Need to make %%* allowance for hyphen being in different place?) /.hyphen 45 def % Write out a string. If it ends in a letter and a hyphen, % don't write the hyphen, and set .show.last to a hyphen; % otherwise, set .show.last to the character (or \000 if it was a hyphen). /.show.write % <string> { dup length 1 ge % calculate new llx

% calculate new lly

% calculate new urx

% calculate new ury

{ dup dup length 1 sub get % string last_char dup .hyphen eq % string last_char hyphen? { % string last_char 1 index length 1 gt { 1 index dup length 2 sub get } { //.show.last 0 get } ifelse % string last_char prev-char currentfont /Encoding get exch get % look up prev-char //.letter.chars exch known % is it a letter? { % Remove the hyphen % string last_char exch % last_char string dup length 1 sub % last_char string len-1 0 exch getinterval % last_char string-1 exch % string-1 last_char } { pop 0 } % string 0 ifelse } if //.show.last 0 3 -1 roll put % store last_char % in .show.last % If .show.last == % hyphen, then % last char of % previous string % was a hyphen } if % string currentfont /FontType get 0 ne { { % begin forall % c dup % c c currentfont /Encoding get % c c vec exch get % c name dup //.char.map exch known % c name bool { exch pop } { pop OT1Encoding exch get } ifelse % name //.char.map exch get % translation .show.stdout exch writestring } forall } { (\0) dup 0 get 0 eq { 0 1 put (%stderr) (w) file dup (*** Warning: composite font characters dumped without decoding.\n) writestring closefile } { pop } ifelse .show.stdout exch writestring } ifelse } odef /.showstring1 { currentpoint .coord % string % string x y

3 -1 roll dup .showwidth % x y string dx dy 1 index % x y string dx dy dx 0 rmoveto % x y string dx dy .dcoord pop % x y string width SIMPLE { % x y string width 2 index % x y string width y //.show.y .iget % x y string width y old.y %%* %%* Replaced test "has y changed" by "has y changed by more %%* than the current font height" so that subscripts and %%* superscripts won't cause line/paragraph breaks %%* sub abs dup % x y string width dy dy //.show.height .iget gt { % x y string width dy %%* %%* %%* %%* Vertical position has changed by more than the font height, so we now try to figure out whether we've started a new paragraph or merely a new line, using a variety of heuristics.

%%* If any of the following is true, we start a new %%* paragraph: %%* (a) the current vertical shift is more than 1.1 times %%* the previous vertical shift, where 1.1 is an %%* arbitrarily chosen factor that could probably be %%* refined. dup % x y string width dy dy //.show.dy .iget 1.1 mul gt exch %%* Save the new vertical shift //.show.dy exch .iput %%* (b) The vertical shift is more than 1.3 times the %%* "size" of the current font. I've removed this %%* test since it's not really very useful. %%* %%* %%* %%* %%* %%* //.show.dy .iget //.show.height .iget 1.4 mul gt % x y string width bool .show.height .iget 0 gt and % only perform test if font % height is nonzero or %%* (c) the first character of the new line is one of the %%* .break.chars 2 index length % x y 0 gt % x y { 2 index 0 get % x y currentfont /Encoding exch get % x y string width newpar? len string width newpar? len>0? string width newpar? s get string width newpar? s_enc

//.break.chars exch known { pop true } if } if % x y string width newpar? %%* (d) The indentation of the new line is greater than %%* the indentation of the previous line. 4 index //.show.indent .iget gt or %%* HOWEVER, if the line ends in a hyphen, we do NOT begin %%* a new paragraph (cf. comment at end of BF2). --dmj, %%* 12/23/97 //.show.last 0 get .hyphen ne and % newpar? { (\n\n) } { % Paragraph % Line %%* %%* BF2: If last character on a line is %%* a hyphen, we omit the hyphen and %%* run the lines together. Of %%* course, this will fail if a word %%* with an explicit hyphen (e.g., %%* X-ray) is split across two lines. %%* Oh, well. (What should we do %%* about a hyphen that ends a %%* "paragraph"? Perhaps that should %%* inhibit a paragraph break.) %%* //.show.last 0 get .hyphen eq { () } { (\n) } ifelse % x y string width char

} ifelse //print //.show.y 3 index .iput % x y string width //.show.x 4 index .iput % x y string width //.show.indent 4 index .iput } { % x y string width dy % If the word processor split a hyphenated word within % the same line, put out the hyphen now. pop //.show.last 0 get .hyphen eq { (-) //print } if } ifelse %%* %%* %%* %%* %%* %%* 3 index If have moved more than 1 point to the right, interpret it as a space? This need to be looked at more closely. % x y string width x

//.show.x .iget 10 add gt { ( ) //print } if 4 1 roll .show.write pop add //.show.x exch .iput } { (S ) //print .show==4 } ifelse } odef

% x y string width bool % % % % x y string width width x y string width x <empty>

/.showstring { dup () eq { pop } { .showstring1 } ifelse } bind def % Redefine all the string display operators. /show { .showfont .showcolor .showstring } codef % We define all the other operators in terms of .show1. /.show1.string ( ) def /.show1 { //.show1.string exch 0 exch put //.show1.string .showstring } odef /ashow { .showfont .showcolor { .show1 2 copy rmoveto } forall pop pop } codef /awidthshow { .showfont .showcolor { dup .show1 4 index eq { 4 index 4 index rmoveto } if 2 copy rmoveto } forall pop pop pop pop pop } codef /widthshow { .showfont .showcolor //.show1.string 0 4 -1 roll put { //.show1.string search not { exit } if .showstring .showstring 2 index 2 index rmoveto } loop .showstring pop pop } codef /kshow { .showfont .showcolor %**************** Should construct a closure, in case the procedure %**************** affects the o-stack. { .show1 dup exec } forall pop } codef % We don't really do the right thing with the Level 2 show operators, % but we do something semi-reasonable. /xshow { pop show } codef

/yshow { pop show } codef /xyshow { pop show } codef /glyphshow { currentfont /Encoding .knownget not { {} } if 0 1 2 index length 1 sub { % Stack: glyph encoding index 2 copy get 3 index eq { exch pop exch pop null exit } if pop } for null eq { (X) dup 0 4 -1 roll put show } { pop } ifelse } codef end % Bind the operators we just defined, and all the others if we didn't % do it before. DELAYBIND { .bindnow } if % Make systemdict read-only if it wasn't already. systemdict wcheck { systemdict readonly pop } if % Restore the current local/global VM mode. exec

You might also like