Post Wrap-up Changes

Several changes need to be done manually to the file:

  • Pages that were really really big weren't fully printed by the code on Smallwiki Conversion Code, they ended up ending in ...etc..., these were copied manually.
  • Pages that contain pictures weren't displayed correctly. These needed to have a div before (or after) got make the pictures block elements. The tags are:
    <div style="clear: both"> <img ...> </div>
  • Smallwiki didn't support the full renaming and moving functionality that Pier supports. Sometimes renaming a page wouldn't work. In this case, the pages are renamed in the file.
  • Look for any string &#[0-9][0-9][0-9][0-9]; - These are usually M$ characters which aren't printed out correctly.
  • Smallwiki also had a habit of adding \\ before an unmatched '*', and would sometimes build up a long list of these. The same is true for '['.
  • Many other problems were found to be easier to fix in Smalltalk:

    • any line that starts with : or ;
    • table cells that contain || - Pier interprets this a centering text, and leaves an empty cell at the end.
    • new edits:
      • 4 double quotes, as in: ""bold""
      • 4 single quotes, as in: ''italics''
      • 4 equal signs, as in ==monospace==
      • 4 dashes, as in --strikethrough--
      • 4 at signs, as in @@subscript@@
      • 4 carrets, as in ^^superscript^^
      • 4 underscores, as in __underlined__
    • In SmallWiki, pre-formatted code that starts with '=' could have HTML market, in Pier, this isn't an option (but it makes pre-formatted text simplier, especially when explaining HTML).
    • The &#43; character isn't a special character in smallwiki, it needed to be changed into one for pier. When this is used in normal text, it needs to be changed to &AMP;#43;. Exceptions were:
      • When found in tags, such as &lt;font size=&#43;2>
      • In URLs, such as &#43;http:...Open+%2F...&#43;
    • Pages with ASCII art made with <PRE> tags had to be modified so dashes wouldn't turn into strikethrough characters, text prefixed with the '=' was OK.

    Here's the Smalltalk code to fix these. It reads the file wiki.test.st and creates the file test.new.st:

    | file1 line outStream shouldConvertPre preFormattedLines updatedPreLines count markupConversions convertOriginalBlock conversionDictionary convertHtmlToPier filename dir |
    Transcript cr; show: 'Starting at: '; show: Time now.
    filename := 'test.new.st'.
    markupConversions := #('==' '''''''''' '""' '^;' '^;' '__' '@@' '\^\^') collect: [:pattern | RxMatcher forString: pattern ignoreCase: false].
    convertOriginalBlock := [:string | markupConversions inject: string into: [:str :re || results | 
    		results :=  re copy: str translatingMatchesUsing: [:match | '\', match]].
    	(RxMatcher forString: '[^#-]--' ignoreCase: false) copy: results translatingMatchesUsing: [:match | match first asString, '\--']].
    (conversionDictionary := Dictionary new) at: (RxMatcher forString: '<code>[^<]*</code>' ignoreCase: true) put: #('==' 7 7);
    	at: (RxMatcher forString: '<i>[^<]*</i>' ignoreCase: true) put: #('''''''''' 4 4);
    	at: (RxMatcher forString: '<b>[^<]*</b>' ignoreCase: true) put: #('""' 4 4);
    	at: (RxMatcher forString: '<u>[^<]*</u>' ignoreCase: true) put: #('__' 4 4).
    convertHtmlToPier := [:string | string isEmpty ifTrue: [string] ifFalse: [
    		string := ([conversionDictionary keys contains: [:re | re search: string]] whileTrue: [
    			(conversionDictionary keys select: [:re | re search: string]) do: [:re | string := re copy: string
    					translatingMatchesUsing: [:match || array | array := conversionDictionary at: re.
    						array first, (match copyFrom: array second to: match size - (array third)) , array first]]]) ifNil: [string].
    		(('*font size=+*' match: string) or: [(RxMatcher forString: 'http[^ ]*\+' ignoreCase: false) search: string]) ifTrue: [string]
    			ifFalse: [string := (RxMatcher forString: '\+' ignoreCase: true) copy: string translatingMatchesUsing: [:p | '&#43;'].
    				string first = $| ifTrue: [string := (RxMatcher forString: '\|\|' ignoreCase: true) copy: string translatingMatchesUsing: [:p | '| |']].
    				string
    			]]].
    count := 0.
    line := nil.
    preFormattedLines := OrderedCollection new.
    updatedPreLines := OrderedCollection new.
    outStream := file1 := nil.
    [	
    	((dir := FileDirectory default) fileExists: filename)
    		ifTrue: [self halt.  outStream := dir oldFileNamed: filename.  outStream setToEnd]
    		ifFalse: [outStream := dir newFileNamed: filename].
    	file1 := FileStream fileNamed: 'wiki.test.st'.
    	"file2 := FileDirectory default newFileNamed: 'test.new.st'."
    	shouldConvertPre := false.
    	[file1 atEnd] whileFalse:
    		[line := file1 nextLine.
    			count := count + 1.
    			(line notEmpty and: [line first = $=]) ifTrue: [shouldConvertPre := shouldConvertPre | (line includesAnyOf: '<').
    			preFormattedLines add: line.
    			shouldConvertPre ifFalse: [
    				updatedPreLines add: (line copyWithRegex: '&[^;]*;' matchesTranslatedUsing: 
    					[:p || digits | p asUppercase = '&LT;' ifTrue: ['<'] 
    						ifFalse: [p asUppercase = '&GT;' ifTrue: ['>']
    						ifFalse: [(p second = $# and: [(digits := p copyFrom: 3 to: (p size - 1)) isAllDigits and: [digits asInteger < 256]]) 
    							ifTrue: [String with: (Character value: digits asInteger)]
    							ifFalse: [shouldConvertPre := true. p]]]])]
    			] ifFalse: [
    				preFormattedLines isEmpty ifFalse: [
    					shouldConvertPre ifTrue: [outStream nextPutAll: '<pre>'.
    						preFormattedLines allButLast do: [:lineStartingWithEquals | outStream nextPutAll: (
    							convertHtmlToPier value: (convertOriginalBlock value: lineStartingWithEquals copyWithoutFirst)).
    							outStream cr; cr].
    						(preFormattedLines last matchesRegex: '.*[^'']''.') ifTrue: [outStream nextPutAll: (
    							convertHtmlToPier value: (convertOriginalBlock value: (preFormattedLines last copyFrom: 2 to: (preFormattedLines last size - 2)))).
    								outStream nextPutAll: '</pre>''.'] ifFalse: [outStream nextPutAll: (
    							convertHtmlToPier value: (convertOriginalBlock value: preFormattedLines last copyWithoutFirst)).
    								outStream nextPutAll: '</pre>'].
    						outStream cr] ifFalse: [
    				updatedPreLines do: [:lineStartingWithEquals | outStream nextPutAll: lineStartingWithEquals; cr]]].
    				preFormattedLines := OrderedCollection new.
    				updatedPreLines := OrderedCollection new.
    				shouldConvertPre := false.
    				outStream nextPutAll: (convertHtmlToPier value: (convertOriginalBlock value: line)); cr]
    		]
    	] ensure: [file1 isNil ifFalse: [file1 close].
    		outStream isNil ifFalse: [outStream close]].
    Transcript cr; show: 'Read '; show: count; show: ' lines'.
    Transcript cr; show: 'Finishing at: '; show: Time now.

This script may take five or ten minutes to run on a large wiki. After these changes are done, the file test.new.st can be split into smaller files, and executed in the new Pier image using doits.

After importing, the persistency will have the page changes, but they won't be sorted by date. Use the following to re-sort them:

PRKernel instances anyOne persistency inspect.

In the inspector, do the following:

history := (self history asSortedCollection: [:a :b | a command timestamp < b command timestamp]) asOrderedCollection

When a page is moved, it will not correct its links. Run the following to fix this (taken from PRExportImportWidget>>sanitizeImport):

PRKernel instances anyOne root enumerator everything do:
  [:page | page outgoingReferences do: [:link |
	(link isInternal and: [link isBroken not])
				ifTrue: [link update]]]

Its also a good idea to look for any broken links, there should be no more broken links when in the Smallwiki Assessment step.

| brokenLinks |
brokenLinks := Dictionary new.
PRKernel instances anyOne root enumerator everything do: [:page | page outgoingReferences do: [:link |
			(link isInternal and: [link isBroken])
				ifTrue: [brokenLinks at: link put: page]]].
brokenLinks inspect