% verbatimfile 2022.03.21 % D. J. Bernstein % Public domain. % % Knuth's original manual for TeX, the TeXbook, includes many snippets of % TeX input. For example, the TeXbook presents a snippet % % George P\'olya and Gabor Szeg\"o % % and says that TeX converts this to "George Pólya and Gabor Szegö". The % snippet is monospaced (i.e., typewriter font), closely matching the way % that people typically read and write TeX input. % % What Knuth actually wrote in texbook.tex at this point was the following: % % Here's another example: % \begintt % George P\'olya and Gabor Szeg\"o. % \endtt % \TeX\ converts this to `George P\'olya and Gabor Szeg\"o.' % % Knuth had set up \begintt and \endtt macros so that TeX would display % the text between \begintt and \endtt verbatim, rather than converting % P\'olya into Pólya and Szeg\"o into Szegö. % % The underlying "verbatim" commands in plain TeX evolved into LaTeX's % verbatim command, and LaTeX's verbatim package, and LaTeX's fancyvrb % package, and LaTeX's listings package, and so on, with various features % for tweaking the output (e.g., syntax highlighting) and processing the % input. % % But there's a problem. If you point (say) Firefox to the first verbatim % snippet on page 2 of in % % https://mirrors.ctan.org/macros/latex/required/tools/verbatim.pdf % % or the verbatim snippet on the top right of page 5 of % % https://mirrors.ctan.org/macros/latex/contrib/listings/listings.pdf % % and try _copying and pasting_ the snippet then you'll see that the paste % doesn't include the leading spaces and blank lines from the snippet. For % % https://mirrors.ctan.org/macros/latex/contrib/fancyvrb/doc/fancyvrb-doc.pdf % % this is partially masked by a bigger copy-and-paste problem, which is % that the examples are given with line numbers, but if you try copying % and pasting the left-and-right-side examples in Section 4.1.2 then % you'll see that leading spaces are again not preserved. % % This isn't just a question of appearance of the paste. A pasted Python % script typically won't work. One can resort to attachments or links, but % there's value in self-contained documents; how about fixing the tools so % that copy-and-paste works correctly? % % The objective of verbatimfile is to include a file verbatim inside a TeX % document while allowing the file to be copied and pasted from the PDF, % preserving leading spaces, intermediate spaces, and blank lines. Usage % is % % \input verbatimfile % ... % \verbatimfiledisplay{xyzzy.py} % % for a display inside a paragraph (prefixed by four spaces that aren't % copied; vertical list spacing above and below), or % % \input verbatimfile % ... % \verbatimfile{xyzzy.py} % % for text inside a figure (no prefix, no extra vertical spacing). There's % also \verbatimfileprefixeddisplay and \verbatimfileprefixed with a % second argument specifying a prefix for each line. % % This is currently at a "does something useful enough for me to use it" % stage: Firefox successfully copies and pastes Python scripts from PDFs, % including leading spaces and blank lines. There's still more to do: % making this work with other PDF readers, handling tabs, making sure that % malicious files can't invoke TeX commands, turning this into a package, % integrating the ideas into other packages, etc. % % The copy-and-paste problem has been considered many times before, in % general with people sounding unhappy: % % https://tex.stackexchange.com/questions/19949/how-to-make-listings-code-indentation-remain-unchanged-when-copied-from-pdf % https://tex.stackexchange.com/questions/62221/ensure-verbatim-code-block-is-copy-paste-able % https://tex.stackexchange.com/questions/142617/copy-pasting-leading-whitespace-and-blank-lines-in-listings-package-pdf % https://tex.stackexchange.com/questions/148144/viewer-independent-copyable-spaces-at-the-beginning-of-a-line % https://tex.stackexchange.com/questions/195489/how-to-copy-paste-multiple-spaces-from-lstlistings % https://tex.stackexchange.com/questions/323294/produce-copy-paste-able-pdf-output-with-correct-indentation-with-listing % https://tex.stackexchange.com/questions/417259/how-to-keep-indentations-in-python-code-copied-from-latex-pdf % https://tex.stackexchange.com/questions/563803/how-make-a-latex-document-that-generates-a-pdf-from-which-copy-paste-works-corre % https://www.monperrus.net/martin/copy-pastable-listings-in-pdf-from-latex % % Sometimes people point to PDF's "ActualText" feature, typically via % LaTeX's accsupp package. This feature is supposed to override what's % pasted. Maybe this is exactly what it does in Acrobat Reader, but in % Firefox it seems to supplement what's pasted, and in Chromium and Evince % it doesn't seem to work at all. % % The approach from Denis Ryabov in % % https://tex.stackexchange.com/questions/417259/how-to-keep-indentations-in-python-code-copied-from-latex-pdf % % instead uses "\pdffakespace", which is supposed to insert an empty % object that copies as a space. This has been in pdflatex since 2014 but % doesn't seem to be available in lualatex or xelatex. % % Why hasn't everything always worked? If there's an ASCII space or LF in % the input, why isn't TeX passing it through as a space or LF to PDF, and % why isn't the PDF reader copying it as a space or LF? % % TeX normally converts an input "hello world" into two boxes connected by % glue. The first box says "hello"; the second box says "world"; the glue % can stretch or shrink, typically for right-justifying a line of text. % The glue doesn't say that it was something the user meant as a space % between words. The left margin in a quoted paragraph is also glue but % has a different meaning for the user. % % TeX decides where exactly to put the boxes on the page. It then produces % a PDF saying that "hello" is at position (x_1,y_1), and "world" is at % position (x_2,y_2). Some readers will have trouble copying and pasting % "hello world". Some readers will guess based on the (x,y) positions that % "hello world" was intended. A few readers (e.g., pdftotext -layout) will % try to guess the number of spaces between "hello" and "world" based on % the (x,y) positions. % % As an optimization, if "hello" and "world" are lined up horizontally and % have the expected space between them, then TeX produces a PDF saying % that "hello world" is at position (x_1,y_1). Readers will then reliably % copy and paste "hello world". But TeX won't do this for spaces before % "hello". % % Why should verbatim spaces be converted into glue in the first place? % The main feature of glue, namely the ability to stretch and shrink, is % irrelevant here. So why not have the spaces converted into "\char32", % which tells TeX to create a box with character 32, an ASCII space? % % The basic problem here is that TeX never bothered putting a space % character into its fonts. If you type (one line at a time, not as a % four-line copy-and-paste!) % % pdftex testfont % cmtt10 % \table % \bye % % and look at the resulting testfont.pdf then you'll see that character 32 % in TeX's basic typewriter font, cmtt10, is an open box, similar to |_| % but with shorter vertical bars and connected at the bottom. In Unicode % this is U+2423, ␣, also known as ␣ in HTML. This is supported as a % "visible space" option in typical verbatim packages, and occasionally % that's what the user wants, but it isn't what the user normally wants, % namely a space character. % % There are many ways to make the open box invisible in the PDF (for % example, putting \pdfliteral{3 Tr} before it and \pdfliteral{0 Tr} % after), so that it looks like a space character. But what happens when % the open box is copied and pasted? % % A 2018 comment from Frank Mittelbach in % % https://tex.stackexchange.com/questions/448734/without-loading-fontspec-verb-cannot-produce-visible-space-under-xelatex % % said that open boxes use "the character in slot 32 as that works % correctly if you cut and paste from a pdf (producing a space then)". % But, no, the character is marked as U+2423 despite being in slot 32, and % modern PDF readers copy it as an open box, again breaking Python. It's % hard to argue against marking the character as U+2423 given that it is, % in fact, an open box; the real problem is that the font doesn't have a % space character. % % The way verbatimfile works around this is by using another monospace % font that _does_ have a space character in slot 32: pcrr8r. Maybe the % user doesn't want to have verbatim listings showing up as pcrr rather % than as cmtt; but verbatimfile simply grabs the space character from % pcrr8r, while using \tt for the displayed text. There's usually a % mismatch between the pcrr8r width and the \tt width; verbatimfile deals % with this by scaling pcrr8r up or down so that the widths match. The % heights are usually slightly off, which is visible when the user % highlights text to copy in a PDF reader, but this is a minor issue. % % For lualatex and xelatex, everything is easier. The fonts are designed % around Unicode to begin with, and in particular the normal \tt fonts % _do_ have a space character in slot 32 (putting the burden on "visible % space" features to find an open box somewhere else), so verbatimfile % simply uses \char32, although still making it invisible just in case. % % Blank lines disappear for much the same reason that initial spaces do: % TeX simply specifies the exact vertical position of each line without % putting any LF characters into the PDF. Typical PDF readers will take % any change of vertical position as a new line. % % Rather than letting blank lines disappear, verbatimfile artifically % turns each blank line into a line with a single CR character. Firefox % turns the CR character into a space. Maybe someday Firefox can be % convinced to remove the CR; anyway, the space doesn't cause problems for % Python. Actually, verbatimfile inserts CR with lualatex or xelatex, but % space with pdflatex, since CR with pdflatex breaks copy-and-paste in a % way that I haven't diagnosed. \makeatletter \ifx\pdfextension\@undefined \def\verbatimfile@spacingsetup{\relax \newdimen\verbatimfile@fontwidth \verbatimfile@fontwidth\fontdimen2\font \newdimen\verbatimfile@spacefontlower \newdimen\verbatimfile@spacefontupper \newdimen\verbatimfile@spacefontmid \verbatimfile@spacefontlower 1sp\relax \verbatimfile@spacefontupper 1000pt\relax \loop\ifdim\verbatimfile@spacefontupper>\verbatimfile@spacefontlower \verbatimfile@spacefontmid\verbatimfile@spacefontlower \advance\verbatimfile@spacefontmid\verbatimfile@spacefontupper \divide\verbatimfile@spacefontmid 2\relax \font\verbatimfile@spacefont pcrr8r at\verbatimfile@spacefontmid \ifdim\fontdimen 2\verbatimfile@spacefont>\verbatimfile@fontwidth \verbatimfile@spacefontupper\verbatimfile@spacefontmid \else \verbatimfile@spacefontlower\verbatimfile@spacefontmid \advance\verbatimfile@spacefontlower 1sp\relax \fi \repeat \newbox\verbatimfile@spacebox \setbox\verbatimfile@spacebox\hbox{\verbatimfile@spacefont\char32}\relax \ht\verbatimfile@spacebox 0pt\relax \dp\verbatimfile@spacebox 0pt\relax \def\verbatimfile@spacing{\copy\verbatimfile@spacebox}\relax } \def\verbatimfile@blankline{\verbatimfile@spacing}\relax \let\verbatimfile@spacingsimplify\relax \else \let\verbatimfile@spacingsetup\relax \def\verbatimfile@spacing{\pdfliteral{3 Tr}\char32\pdfliteral{0 Tr}}\relax \def\verbatimfile@blankline{\pdfliteral{0 0 Td [<000d>]TJ}}\relax \def\verbatimfile@spacingsimplify{\def\verbatimfile@spacing{ }}\relax \fi \begingroup\catcode`\ =13\relax \global\let\verbatimfile@spacechar \relax \gdef\verbatimfile@activespace{\catcode`\ =13\let \verbatimfile@spacechar}\relax \endgroup \begingroup\catcode`\^^M=13\gdef\verbatimchars#1{{\let\verbatimchar\relax\let\blankline\verbatimfile@blankline\expandafter\verbatimsplit#1^^M^^M\blankline}}\endgroup \begingroup\catcode`\^^M=13\gdef\verbatimsplit{\ifx\verbatimchar^^M\else\ifx\verbatimchar^^M\else\ifx\verbatimchar\relax\else\let\blankline\relax\ifx\verbatimchar\verbatimfile@spacechar\verbatimfile@spacing\else\verbatimfile@spacingsimplify\verbatimchar\fi\fi\fi\afterassignment\verbatimsplit\fi\let\verbatimchar=}\endgroup \def\verbatimfile@process{\relax \read\verbatimfile@stream to\verbatimline \ifeof\verbatimfile@stream \else \separatelines \leavevmode\hbox to\hsize{\prefix\verbatimchars{\verbatimline}\hss}\relax \let\separatelines\\\relax \verbatimfile@process \fi } \def\verbatimfile@prefixed#1#2{\relax \begingroup \def\prefix{#2}\relax \tt \verbatimfile@spacingsetup \frenchspacing \language\l@nohyphenation \@noligs \catcode`\{=12 \catcode`\}=12 \catcode`\$=12 \catcode`\&=12 \catcode`\#=12 \catcode`\^=12 \catcode`\_=12 \catcode`\^^I=12 \catcode`\~=12 \catcode`\|=12 \catcode`\%=12 \catcode`\\=12 \catcode`\^^L=12 \catcode`\^^M=13 \verbatimfile@activespace \newread\verbatimfile@stream \openin\verbatimfile@stream #1\relax \ifeof\verbatimfile@stream \errmessage{Nonexistent file #1}\relax \else \let\separatelines\relax \verbatimfile@process \closein\verbatimfile@stream \fi \endgroup } \def\verbatimfile@#1{\verbatimfile@prefixed{#1}{}} \def\verbatimfile@prefixeddisplay#1#2{\begin{trivlist}\item\verbatimfile@prefixed{#1}{#2}\end{trivlist}} \def\verbatimfile@display#1{\verbatimfile@prefixed{#1}{{ }{ }{ }{ }}} % API: \let\verbatimfileprefixed\verbatimfile@prefixed \let\verbatimfile\verbatimfile@ \let\verbatimfileprefixeddisplay\verbatimfile@prefixeddisplay \let\verbatimfiledisplay\verbatimfile@display \makeatother