NewView Design Notes
--------------------

Like all good hobbyist apps NewView didn't really get an overall design,
but some aspects of it are somewhat non-obvious.

NewView.exe is a Speedsoft Sibyl application. The original Sibyl libraries
have been significantly bug-fixed, and slightly enhanced for performance etc.
They were originally Sibyl Fixpack 3, as I had difficulties with Fixpack 4
(FP4 introduced Linux support, so changed many things and broke some).

The new HelpMgr.dll is written using Open Watcom C++
http://openwatcom.org
Actually it's straight C, for some reason I did not use C++...

===============================================================================
Strings
-------

Unfortunately Sibyl is still stuck in the Delphi 1.0 (?) days when strings
were at most 255 bytes. This makes them useless for a lot of the stuff NewView
does (such as decoding help topics). Sibyl has AnsiStrings, which are large,
referenced counted strings. I think they do work, but unfortunately most of
Sibyl's libraries do not use them, and without optimisation there is no
efficient way to concatenate onto ansistrings(!).

As a result of all this I use pchars (zero terminated strings), or
my TAString class. This is incredibly tedious compared to automatically managed
strings, but no worse than any other objects in Object Pascal, and these strings
are pretty efficient for concatenation, and have no length limits. I made
many conversion functions such as AsString to make it easier to combine with
other types of strings.

===============================================================================
Performance
-----------

Currently, TAString has a cunning piece of code that explicitly checks for
invalid object references in *every method*. While very cunning, this is kind of
ridiculous performance wise, because it sets up exception handling even for addition
of a single character. Probably, it should be compiled out for a release mode.

Sibyl's compiler is unfortunately completely broken in optimising mode. How a
commercial compiler could be released like this is an interesting question. Suffice
to say, Sibyl gives you fast-ish compiled code, but it is probably less than half
as fast as a properly optimsed C/C++ program, maybe much worse. This is primarily
because it does not make any use of registers, instead using stack for nearly
everything.

As a result, I have done a few key things:
- const string parameters wherever possible.
  This avoids the large overhead of copying the entire string each call
- assembly for a few key things
  Sibyl already has some assembly for certain string ops, I added a few more.
  More could be used, but there are diminishing returns. Sibyl does have
  a highly optimised Move (memory copy) so that is used e.g. in TAString
- Minimising string operations
  Some bits of the help topic decoding are a bit longer to avoid so much
  copying of strings.

Of course, algorithms have been optimised carefully, especially in the area
of file I/O: I changed the help file loader over from loading the entire file
to just loading pieces as needed.

On the whole this achieves decent peformance. Unfortunately, the SPCC component
library seems somewhat sluggish - not to the level of interpreted code, but
definitely much less snappy than compiled C apps. Here, I think optimisation is
desperately needed, to speed up all the repititive loops in e.g. loading forms
from SCU (resource) data. Short of re-writing the whole thing in C/C++ this cannot
be improved much.

===============================================================================
Multi-lingual Support
---------------------

The goals of the language code is

a) Load languages at runtime, reasonably fast
b) Be human editable, so that people can contribute without
   any tools required or even contacting the author
c) Be able to update an existing language file with missing
   items to avoid manual tracking of which strings have
   been translated
d) Minimal specific code

b and c are to encourage open source contributions.

Resources or msg files would satisfy a, but neither is human
editable, and both are difficult to update at runtime.

Instead, I implemented the following design

Language Files

These files are simply text files, each line consisting of:

  <item name> "<item value>"

<item name> is the name of various text strings from the application.
The names are defined by the application code.
<item value> is the string that should be used for this language.
Strings can be DBCS for e.g. Japanese.

Forms

When created, each form registers a method of itself for language events,
which looks like this:

    Procedure OnLanguageEvent( Language: TLanguageFile;
                               const Apply: boolean );

This callback is called in one of two cases:
  1) Apply = true: A new language is being loaded, or
  2) Apply = false: A language file is being updated/saved. (more later).

In the first case, the form is expected to load all it's strings from
the language file it is given. Specifically, it should:

  1) Call the Language.LoadControlLanguage( self )

     This loads strings for any static control elements e.g.
     menu and button captions and hints. Parts of controls that
     are normally variable e.g listbox items are not loaded.

  2) Load any strings to be used in code ("code strings")

     It should do this by calling Language.LL:
       Language.LL( Apply, <string var>, '<name of string>', '<default>' );
     for each string to be found.
     LL will look up the given name in the language file,
     then store the value into the string variable, or <default> if not found.

When the form first registers itself, the language callback will be
called immediately, so the form can load the current language.

Note: Even if the default language is being used (ie. no language file used,
ie. English) this still happens so that default values can be loaded for
code strings (e.g. prompts).

Strings Outside of Forms

Code strings that are needed but aren't related to a form can be
registering a procedure at unit initialization:

  Initialization
    RegisterProcForLanguages( OnLanguageEvent );

These language events look the same as for forms but are not methods:

  Procedure OnLanguageEvent( Language: TLanguageFile;
                             const Apply: boolean );

this procedure should

    1) Set the prefix: Language.Prefix := '<Category>.';

       This is mostly a convenience, allowing codestrings to
       be grouped together. <Category> could be unit name, for instance.

       NOTE: LoadControlLanguage sets prefix automatically to the form
       name being loaded, which allows related codestrings to automatically
       appear under the form name. However, this prefix remains in effect
       until changed or another form is loaded, so you MUST set it if loading
       strings outside a form.

    2) Call LL for each needed string, as before

Loading Languages

  var
    Language: TLanguageFile;

  try
    Language := TLanguageFile.Create( Filename );
  except
    ...
  end;

  ApplyLanguage( Language );

This will call all registered methods and procedures telling them that
a new language is to be loaded.

Updating Language Files

  try
    Language := TLanguageFile.Create( Filename );

    UpdateLanguage( Language );

    Language.AppendMissingItems;
  except
    ...
  end;

  Language.Destroy;

UpdateLanguage calls all registered methods and procedures, but with
Apply set to false so that required items are searched for, but not
actually applied to components or code string variables.

After UpdateLanguage, the TLanguageFile contains a list of missing items
and their default values. AppendMissingItems writes this list to the
end of the file called filename.

When a new language file is being saved, none of the required items
will be found, so all of them will be stored. Note that the strings
saved to the file will be the default value for codestring, but the
last loaded language strings for forms.

Dynamically Loaded Forms

In order that language files can include all required strings,
all forms must be loaded before updating a language file. This can be
semi-automated by registering an "ensure loaded" procedure for each
form:

  procedure EnsureMyFormLoaded;
  begin
    if MyForm = nil then
      MyForm := TMyForm.Create( nil );
  end;

the registering this procedure in the UNIT initialization:

  Initialization
  ...
    RegisterUpdateProcForLanguages( EnsureMyFormLoaded );

UpdateLanguage will call all these procedures before accessing the
language file, ensuring that all forms are loaded.

Performance of String Lookups

Requirement (a) included the desire to be "reasonably fast". Well, looking up
names in a string list is not the fastest thing to do. I've made this use
a binary search, so the algorithm is O( N log N ) which is generally considered
acceptable. I haven't done any timings though.

Testing

... there are currently 443 strings required for NewView ...

In practice it seems to take no detectable time to load a language. That's
on my Duron 1.1G, 256MB RAM.

Multilingual stuff could be further optimised:
- Optimise for sequential lookup. Most of the time, the next item will be
  immediately after the previous item, so long as the language file
  has not been rearranged. If not, fall back to normal search.
  This is probably the simplest and most beneficial.

- change to pstrings: instead of copying strings from the language file
  (codestrings only), store pointers to them.
  This would save time (since not copying so many strings) and memory
  (since there would not be two copies of all strings).
  The time is probably not significant because there is not a lot of data.
  The memory would be nice to save, but a lot of the strings are component
  properties which are generally allocated dynamically anyway, and we
  cannot pass a pstring to component string properties.

- could free strings from the language file after using them.
  This would save memory, but might have side effects, and would be no faster.
  OTOH it would save a lot of memory since even component properties would
  not be duplicated.

- lookup strings hiearchically (e.g. first search for formname in a small list)
  This would be a huge improvement in speed for random access, but is probably
  not needed if sequential access is optimised (see above)

===============================================================================
Exception Logging
-----------------

I added code to the Sibyl libraries to store a callstack into global variables.
Performance wise this is outrageously inefficient since it happens on every
exception, but then exceptions should not occur in normal processing.

This global callstack can be accessed in a method specified in Application.OnException.

Annoyingly, this does not get called in exceptions from other (non-modal) forms.


===============================================================================
Help Manager
------------

Replacing HELPMGR.DLL (c:\os2\dll\helpmgr.dll).

HlpMgr2.pas contains a DLL implementation with the same interface.

Use DLLRNAME target.exe HELPMGR=HLPMGR2 for testing.

Initially, I was going to put all the code in HLPMGR2, loading mainform etc.

Problem is, that programs using help manager may have very small stack space, e.g.
ICONEDIT.EXE has 24kB. NewView overflows this and crashes.

Also, as a principle, I feel uncomfortable loading all my unreliable code
into an existing, possibly very robust application's process space, potentially
crashing it if anything goes wrong.*

The third problem was the need to implement 16-bit entry points for certain
older applications, such as PMCHKDsk.

Therefore, instead we launch a NewView session (and relaunch if needed)
The filename is passed as normal.

In addition these parameters are passed:
/hm:xxx
indicating "helpmanager mode"
xxx is the HelpManager window, for NewView to send messages to.

/owner:yyy
yyy is the application window using helpmanager


* HelpMgr is small, tight C code with lots of validation and no GUI code.
I don't think I have had any crashes in released versions, or at least very few
(especially compared to NewView itself ;)

Who Owns the Help Window
------------------------

In original helpmgr, the help window is set to be owned by EITHER
a) the active top-level window when help requested (by whatever means)
OR, if set:
b) the "relative window" set by a HM_SET_ACTIVE_WINDOW message.

In NewView I've currently disabled the ownership; because it's usually more annoying than helpful.

Associations
------------

One or more windows can be associated with a help instance.

I suspect the original HelpMgr stores a window's associated help instance somewhere in window words, but this is not documented.

Instead, I store a list of associated windows with each help instance.

===============================================================================

Searching
---------

The IPF compiler creates a search index table of all the "words" in the text of help topics.
This does not include the titles (displayed in contents) of topics, nor does it include index words.

Each entry in the search table is either a series of alpha numeric characters, or a single non-alphanumeric symbol. So for example if the text of a topic includes __OS2__ then the search table will have entries for "_" and "OS2".

However it is often useful to be able to search for "words" containing symbols, so the program must look for consecutive strings of search table words.

NewView does this by this method:

For each search term (TSearchTerm):
  1. Seperate into IPF words (symbols and alphanumeric strings) (TSearchTerm.Parts)
  2. Search file dictionary
    for starting words, search for finishing matches.
      e.g searching for "if(" we would look for words ending in "if"
    for middle words, search for exact matches (case insensitive).
      e.g searching for "_os2_" we would look for words matching "os2". 
      There could only be "os2" "Os2" "OS2" and "oS2". 
    for end words, search for beginning matches.
      e.g searching for "if(" we would look for words starting with "("
  3. Matched topics are where there is one or more matches for all words
     (This is obtained by looking up matching topics for each word and 
      ANDing the results together.)
    AND a search thru the actual topic text shows one or more occurrences of
    a matching sequence. e.g. we don't want _os2_ to match a topic 
    that just contains an underscore somewhere and "os2" elsewhere.
    This is complicated by the fact that each word can match more than 1 dictionary
    word.



Search logic is:

topic match if:
      ( ( matches all required terms ) or ( there are no required terms) )
  and ( ( matches any optional term ) or ( there are no optional terms ) )
  and ( doesn't match any excluded term ) or ( there are no excluded terms )

Algorithimically this can be done sequentially:

if no terms
  nothing matches, abort

if ( any optional terms) 
  match each term, or results
else
  all topics match by default

if ( any required terms )
  match each term, and results 

if ( any excluded terms )
  match each term, and ( not results )

... this has the benefit of not requiring keeping 2 flags per topic
but who cares... that's not a big cost

Keep an "allowed" (=not excldued) flag a "matched" flag
 
set allowed to true for all
set matched to false for all

if optional term
  or results -> matched flags
if required term
  and results -> allowed flags
  or results -> matched flags
if excluded term
  and not results -> allowed flags

In practice, for speed this is implemented with arrays of integers acting
as flags, one for each topic. This is very susceptible to optimisation, even eg. SSE!
but I have not done any since it seems pretty fast as is.
