% language=uk \startcomponent mk-arabic \environment mk-environment \chapter{Optimization} \subject{quality of code} How good is the \MKIV\ code? Well, as good as I can make it. When you browse the code you will probably notice differences in coding style and this is a related to the learning curve. For instance the \type {luat-inp} module needs some cleanup, for instance hiding local function from users. Since benchmarking has been done right from the start there is probably not that much to gain, but who knows. When coding in \LUA\ you should be careful with defining global variables, since they may override something. In \MKIV\ we don't guarantee that the name you use for variable will not be used at some point. Therefore, best operate in a dedicated \LUA\ instance, or operate in userspace. \starttyping do -- your code end \stoptyping If you want to use your data later on, think of working this way (the example is somewhat silly): \starttyping userdata['your.name'] = userdata['your.name'] or { } do local mydata = userdata['your.name'] mydata.data = {} local function foo() return 'bar' end function mydata.dothis() mydata[foo] = foo() end end \stoptyping In this case you can always access your user data while temporary variables are hidden. The \type {userdata} table is predefined. As is \type {thirddata} for modules that you may write. Of course this assumes that you create a namespace within these global tables. A nice test for checking global cluttering is the following: \starttyping for k, v in pairs(_G) do print(k, v) end \stoptyping When you incidentally define global variables like \type {n} or \type {str} they will show up here. \subject{clean or dirty} Processing the first 120 pages of this document (16 chapters) takes some 23.5 seconds on a dell M90 (2.3GHZ, 4GB mem, Windows Vista Ultimate). A rough estimate of where \LUA\ spends its time is: \starttabulate[|l|c|] \NC \bf acticvity \NC \bf sec \NC \NR \NC input load time \NC 0.114 \NC \NR \NC fonts load time \NC 6.692 \NC \NR \NC mps conversion time \NC 0.004 \NC \NR \NC node processing time \NC 0.832 \NC \NR \NC attribute processing time \NC 3.376 \NC \NR \stoptabulate Font loading takes some time, which is nu surprise because we load huge Zapfino, Arabic and \CJK\ fonts and define many instances of them. Some tracing learns that there are some 14.254.041 function calls, of which 13.339.226 concern functions that are called more than 5.000 times. A total of 62.434 function is counted, which is a result of locally defined ones. A rough indication of this overhead is given by the following test code: \starttyping local a,b,c,d,e,f = 1,2,3,4,5,6 function one (a) local n = 1 end function three(a,b,c) local n = 1 end function six (a,b,c,d,e,f) local n = 1 end for i=1,14254041 do one (a) end for i=1,14254041 do three(a,b,c) end for i=1,14254041 do six (a,b,c,d,e,f) end \stoptyping The runtime for these tests (excluding startup) is: \starttabulate[|l|l|] \NC one argument \NC 1.8 seconds \NC \NR \NC three arguments \NC 2.0 seconds \NC \NR \NC six arguments \NC 2.3 seconds \NC \NR \stoptabulate So, the of the total runtime for this document we easily spend a couple of seconds on function calls, especially in node processing and attribute resolving. Does this mean that we need to change the code and follow a more inline approach? Eventually we may optimize some code, but for the moment we keep things as readable as possible, and even then much code is still quite complex. Font loading is often constant for a document anyway, and independent of the number of pages. Time spent on node processing depends on the script, and often processing intense scripts are typeset in a larger font and since they are less verbose than latin, this does not really influence the average time spent on typesetting a page. Attribute handling is probably the most time consuming activity, and for large documents the time spent on this is large compared to font loading and node processing. But then, after a few \MKIV\ development cycles the picture may be different. When we turned on tracing of function calls, if becomes clear where currently the time is spent in a document like this which demands complex Zapfino contextual analysis as well as Arabic analysis and feature application (both fonts demand node insertion and deletion). Of course using color also has a price. Handling weighted and conditional spacing (new in \MKIV) involves just over 10.000 calls to the main handler for 120 pages of this document. Glyph related processing of node lists needs 42.000 calls, and contextual analysis of \OPENTYPE\ fonts is good for 11.000 calls. Timing \LUA\ related tasks involves 2 times 37.000 calls to the stopwatch. Collapsing \UTF\ in the input lines equals the number of lines: 7700. However, at the the top of the charts we find calls to attribute related functions. 97.000 calls for handling special effects, overprint, transparency and alike, and another 24.000 calls for combined color and colorspace handling. These calls result in over 6.000 insertions of \PDF\ literals (this number is large because we show Arabic samples with color based tracing enabled). In case you wonder if the attribute handler can be made more efficient (we're talking seconds here), the answer is \quotation {possibly not}. This action is needed for each shipped out object and each shipped out page. If we divide the 24.000 (calls) by 120 (pages) we get 200 calls per page for color processing which is okay if you keep in mind that we need to recurse in nested horizontal and vertical lists of the completely made op page. \subject{serialization} When serializing tables, we can end up with very large tables, especially when dealing with big fonts like \quote{arabtype} or \quote {zapfino}. When serializing tables one has to find a compromise between speed of writing, effeciency of loading and readability. First we had (sub)tables like: \starttyping boundingbox = { [1] = 0, [2] = 0, [3] = 100, [4] = 200 } \stoptyping I mistakingly assumed that this would generate an indexed table, but at \TUG\ 2007 Roberto Ierusalimschy explained to me that this was not that efficient, since this variant boils down to the following byte code: \starttyping 1 [1] NEWTABLE 0 0 4 2 [2] SETTABLE 0 -2 -3 ; 1 0 3 [3] SETTABLE 0 -4 -3 ; 2 0 4 [4] SETTABLE 0 -5 -6 ; 3 100 5 [5] SETTABLE 0 -7 -8 ; 4 200 6 [6] SETGLOBAL 0 -1 ; boundingbox 7 [6] RETURN 0 1 \stoptyping This creates a hashed table. The following variant is better: \starttyping boundingbox = { 0, 0, 100, 200 } \stoptyping This results in: \starttyping 1 [1] NEWTABLE 0 4 0 2 [2] LOADK 1 -2 ; 0 3 [3] LOADK 2 -2 ; 0 4 [4] LOADK 3 -3 ; 100 5 [6] LOADK 4 -4 ; 200 6 [6] SETLIST 0 4 1 ; 1 7 [6] SETGLOBAL 0 -1 ; boundingbox 8 [6] RETURN 0 1 \stoptyping The resulting tables are not only smaller in terms of bytes, but also are less memory hungry when loaded. For readability we write tables with only numbers, strings or boolean values in an inline||format: \starttyping boundingbox = { 0, 0, 100, 200 } \stoptyping The serialized tables are somewhat smaller, depending on how many subtables are indexed (boundary boxes, lookup sequences, etc.) \starttabulate[|r|r|l|] \NC \bf normal \NC \bf compact \NC \bf filename \NC \NR \NC 34.055.092 \NC 32.403.326 \NC arabtype.tma \NC \NR \NC 1.620.614 \NC 1.513.863 \NC lmroman10-italic.tma \NC \NR \NC 1.325.585 \NC 1.233.044 \NC lmroman10-regular.tma \NC \NR \NC 1.248.157 \NC 1.158.903 \NC lmsans10-regular.tma \NC \NR \NC 194.646 \NC 153.120 \NC lmtypewriter10-regular.tma \NC \NR \NC 1.771.678 \NC 1.658.461 \NC palatinosanscom-bold.tma \NC \NR \NC 1.695.251 \NC 1.584.491 \NC palatinosanscom-regular.tma \NC \NR \NC 13.736.534 \NC 13.409.446 \NC zapfinoextraltpro.tma \NC \NR \stoptabulate Since we compile the tables to bytecode, the effects are more spectacular there. \starttabulate[|r|r|l|] \NC \bf normal \NC \bf compact \NC \bf filename \NC \NR \NC 13.679.038 \NC 11.774.106 \NC arabtype.tmc \NC \NR \NC 886.248 \NC 754.944 \NC lmroman10-italic.tmc \NC \NR \NC 729.828 \NC 466.864 \NC lmroman10-regular.tmc \NC \NR \NC 688.482 \NC 441.962 \NC lmsans10-regular.tmc \NC \NR \NC 128.685 \NC 95.853 \NC lmtypewriter10-regular.tmc \NC \NR \NC 715.929 \NC 582.985 \NC palatinosanscom-bold.tmc \NC \NR \NC 669.942 \NC 540.126 \NC palatinosanscom-regular.tmc \NC \NR \NC 1.560.588 \NC 1.317.000 \NC zapfinoextraltpro.tmc \NC \NR \stoptabulate Especially when a table is partially indexed and hashed, readability is a bit less than normal but in practice one will seldom consult such tables in its verbose form. After going beta, users reported problems with scaling of the the Latin Modern and \TeX-Gyre fonts. The troubles originate in the fact that the \OPENTYPE\ versions of these fonts lack a design size specification and it happens that the Latin Modern fonts do have design sizes other than 10 points. Here the power of a flexible \TEX\ engine shows \unknown\ we can repair this when we load the font. In \MKIV\ we can now define patches: \starttyping do local function patch(data,filename) if data.design_size == 0 then local ds = (file.basename(filename)):match("(%d+)") if ds then logs.report("load otf",string.format("patching design size (%s)",ds)) data.design_size = tonumber(ds) * 10 end end end fonts.otf.enhance.patches["^lmroman"] = patch fonts.otf.enhance.patches["^lmsans"] = patch fonts.otf.enhance.patches["^lmmono"] = patch end \stoptyping Eventually such code will move to typescripts instead of in the kernel code. \stopcomponent