Difference between revisions of "WikiDB/Tutorial: Creating your own data type"

From TestWiki
Jump to: navigation, search
(Enabling your new data type: or use autoloader.)
m (Testing the type-sniffing: Fix typo)
 
Line 194: Line 194:
 
WikiDB provides a <code>&lt;guesstypes&gt;</code> tag which can be used to test how WikiDB will interpret an undefined field which has a given value (via the GetSimilarity action).  The contents of this tag is split into individual lines, which are interpreted as follows:
 
WikiDB provides a <code>&lt;guesstypes&gt;</code> tag which can be used to test how WikiDB will interpret an undefined field which has a given value (via the GetSimilarity action).  The contents of this tag is split into individual lines, which are interpreted as follows:
  
* If line starts with = then it is indicating what the expected result for the previous line is (useful for unit-testing).  If there is no previous line, it is ignored.  If it contains a colon, everything before the first colon is the expected type and everything after the first colon is the expected output.  The expected type may be left blank, which is the same as omitting it completely, except it allows you to supply an expected output that contains a semi-colon.
+
* If line starts with = then it is indicating what the expected result for the previous line is (useful for unit-testing).  If there is no previous line, it is ignored.  If it contains a colon, everything before the first colon is the expected type and everything after the first colon is the expected output.  The expected type may be left blank, which is the same as omitting it completely, except it allows you to supply an expected output that contains a colon.
 
* Otherwise, line is a new data definition line, which should have its type guessed.  Everything before the first colon is a label for the output, and the rest of the line is the value to be tested.
 
* Otherwise, line is a new data definition line, which should have its type guessed.  Everything before the first colon is a label for the output, and the rest of the line is the value to be tested.
  

Latest revision as of 23:38, 17 July 2021

As well as the built-in data types, WikiDB provides a simple but powerful mechanism for creating your own. This tutorial should provide all the information you need to add your own data types handlers to WikiDB.

Background[edit]

Because everything in WikiDB comes from pages of the wiki, all data is therefore entered as text. It is therefore up to the data type handler to decide whether a particular string is a valid way of representing data of that type, and to render the value in an appropriate manner for display. For example, an integer data type might recognise "5.4" as valid input data, but display it as "5" because it has been rounded to make an integer value.

The handler will also need to format the data in a method suitable for sorting. For example, for an integer data type, the value "50" should come after "6" so we need to format the numeric values in a way that ensures sorting always works as expected. As this can be a little bit complicated, a more detailed discussion of this topic is given below.

Finally, because data is by default untyped, we provide a mechanism to allow the data type to be 'guessed' for untyped fields. For example (assuming no custom data-types have been added which change this behaviour) if the field is untyped then WikiDB will deduce that a value of "808" is an integer, where as "808 State" is a wikistring, and will sort/render the data accordingly.

A data type handler therefore needs to respond to 4 different actions:

  • Validate
  • FormatForDisplay
  • FormatForSorting
  • GetSimilarity

A skeleton data handler[edit]

A data handler is basically a PHP function which takes three arguments:

  • $Action is the appropriate WIKIDB_* constant which dictates the action that is to be performed.
  • $Value is the string value that was entered. Leading/trailing white-space will have been removed, but the data will be otherwise unmodified.
  • $Options is an array of options that were specified for the field. Note that all options are, well, optional so your extension should adopt sensible defaults for any missing items. The array is numerically indexed in the order the options were defined in the field definition.

The function may be defined as a standalone function or as a class method. Here is an example data handler function with comments explaining how each of the actions should be handled.

<?php
 
function MyHandler($Action, $Value, $Options) {
 
	// If your handler recognises any options, it is is sensible to
	// validate/initialise them here as they are likely to be required by several
	// of the actions.  Invalid option values should be silently ignored.
 
	switch ($Action) {
		case WIKIDB_Validate:
			// Check if the supplied input is valid for this type,
			// and return true if it is, or false if not.
 
		case WIKIDB_GetSimilarity:
			// Look at the data and return a number between 0 and 10 to indicate
			// how strongly you feel that the data should be recognised as being
			// of this type.  10 means that the data is DEFINITELY the type in
			// question, and 0 means it is definitely not (which should always
			// be returned if the data is not valid for the type).
			// For example, if you were implementing a 'temparature' type, you might
			// give the value "50" a score of 5 meaning that it is a valid
			// temparature, but it isn't quite specific enough to be sure that this
			// is definitely what was meant.  However "50°c" may result in a score 
			// of 10 indicating that it is unlikely to be anything else.
			// A piece of data will be considered to be of the type which returns
			// the highest number from this action.  If there is more than one type
			// returning the same number, then the first one that was defined
			// wins.
 
		case WIKIDB_FormatForDisplay:
			// Return a string representation of the value, for display within the
			// wiki.  Will only be called for values that have passed validation.
			// The returned value will be parsed as wikitext, so if you want it
			// to be output literally, you should wrap it in <nowiki></nowiki> tags.
 
		case WIKIDB_FormatForSorting:
			// Return a string representation of the value suitable for sorting
			// purposes.  The maximum length of the returned string is 255
			// characters (if it is longer, it will be truncated).
 
	}
 
}
 
?>

Sorting out sorting[edit]

In WikiDB all data is sorted using a string sort, because all wiki data is string data. They are input as strings of wiki text, and are stored in the DB in a single VARCHAR column, which is sorted on using an SQL ORDER BY clause. Therefore, for all data types, we need to generate a string-representation that will cause it to sort correctly in this context. We should also bear in mind that values from different fields may be part of the same sort, and therefore sorting needs to make sense not just in the context of the expected data type, but also in the context of other types, too.

There are usually two things to consider:

  • If the data type would normally be sorted by something other than a string sort, how can it be represented so that it sorts properly (i.e. how do we ensure that "50" comes after "6"?)
  • If the data contains information that is not relevant to sorting, how should it be normalised in order to make it sort correctly (e.g. if a temperature data type recognises both raw numbers (e.g. "50") and numbers with the 'degrees' symbol (e.g. "50°") then we need to ensure that these values are considered equivalent).

As an example of how this can be acheived, look at the way numbers are handled in WikiDB.

For sorting, we build up the string representation as follows:

  • A space character - This ensures numbers always sort before string values (all values are trimmed, so no strings will start with a space).
  • n or p to indicate negative or positive (these abbreviations are conveniently sorted correctly). Zero is considered as positive.
  • The integer part of the number (123 in 123.456), left-padded with spaces to take it up to 126 characters.
  • . (period) character as the decimal point.
  • The decimal part (456 in 123.456), left-padded with spaces to take it up to 126 characters.

This ensures that we get a 255 character string which will always sort correctly.

WikiDB provides a public function that you can use to format numbers in this way. WikiDB_NumberToSortableString($Number) will convert a numeric field into the above sortable format - you are advised to use this function for all numeric data types.

A fully-worked example: IP Address[edit]

This example shows how a simple data-type to hold an IP address might look. For the purposes of this example, we assume that only the standard IPv4 addresses are valid (they look something like this: 91.198.174.192).

<?php
 
function IPAddressTypeHandler($Action, $Value, $Options = array()) {
 
	switch ($Action) {
		case WIKIDB_Validate:
		// We could do this with a regular expression, but to keep things less scary
		// we'll use the explode() method.
 
		// Firstly, split the string up into the individual octets, which must
		// be separated by periods.
			$arrItems = explode(".", $Value);
 
		// If there are not exactly four octets, this is not a valid IP address.
			if (count($arrItems) != 4) {
				return false;
			}
 
		// Check each item is a valid integer, by casting it to an integer
		// and then back to a string, and checking that we get the same string
		// representation as we started with.
		// We also need to ensure that it is in the range 0 to 255.
			foreach ($arrItems as $Octet) {
				if (strval(intval($Octet)) !== $Octet) {
					return false;
				}
				elseif (intval($Octet) < 0 || intval($Octet) > 255) {
					return false;
				}
			}
 
		// If the above tests passed, it is a valid IP address.
			return true;
 
		case WIKIDB_GetSimilarity:
		// Often the similarity check will be different from the validity check,
		// either more or less strict.  In this case, however, it is the same.
 
		// If the number is a valid IP address, return 10 as we are very sure it is
		// an IP.
			if (IPAddressTypeHandler(WIKIDB_Validate, $Value)) {
				return 10;
			}
		// Otherwise, return zero, as it is definitely not.
			else {
				return 0;
			}
 
		case WIKIDB_FormatForDisplay:
		// In this case, if the IP address was valid, it is already formatted
		// appropriately, so we just return the value.  Other data types may require
		// some additional manipulation here.
			return $Value;
 
		case WIKIDB_FormatForSorting:
		// We want to sort IP addresses by number block, therefore we ensure that
		// each octet is represented by three digits.
 
		// Split the string up into the individual octets.
			$arrItems = explode(".", $Value);
 
		// Left-pad each octet so it is 3-digits long, with leading zeros where necessary.
			foreach ($arrItems as $Key => $Octet) {
				$arrItems[$Key] = str_pad($Octet, 3, "0", STR_PAD_LEFT);
			}
 
		// Glue the items back together and return the resulting string.
			return implode(".", $arrItems);
	}
 
}
 
?>

Enabling your new data type[edit]

Having created your type handler, you now need to tell WikiDB to use it. To do this, you simply need to add a line in your LocalSettings.php, after you have included WikiDB.php, which tells WikiDB where to find the function.

For example

  WikiDB_TypeHandler::AddType("ip", "IPAddressTypeHandler");

The first argument is the type name, which is case-insensitive (but, by convention, is normally lower-case). This is the name that is entered on the wiki when defining the field. The second argument may be any valid PHP callback. Note that if your data type handler is defined in an external file, you will need to include this file (or register it with the MediaWiki autoloader) before it is used.

Using your data type[edit]

To use your data type, you simply specify it when defining a field as per the standard syntax. If the type is recognised then it will be shown in the structure when you view the saved table definition.

Testing the type-sniffing[edit]

WikiDB provides a <guesstypes> tag which can be used to test how WikiDB will interpret an undefined field which has a given value (via the GetSimilarity action). The contents of this tag is split into individual lines, which are interpreted as follows:

  • If line starts with = then it is indicating what the expected result for the previous line is (useful for unit-testing). If there is no previous line, it is ignored. If it contains a colon, everything before the first colon is the expected type and everything after the first colon is the expected output. The expected type may be left blank, which is the same as omitting it completely, except it allows you to supply an expected output that contains a colon.
  • Otherwise, line is a new data definition line, which should have its type guessed. Everything before the first colon is a label for the output, and the rest of the line is the value to be tested.

Example:

<guesstypes>
An integer value with a passing test: 808
=integer:808
A string value with deliberately failing test: 808 State
=Incorrect value
Unchecked result: 24.
</guesstypes>

Which generates the following:

Description Original value Type Typed value Formatted for sorting
Spaces are currently collapsed, for ease of reading. Check source for full string.
Unrendered Rendered Unrendered Rendered
An integer value with a passing test 808 808 integer 808 808 [ p 808.0]
A string value with deliberately failing test 808 State 808 State wikistring 808 State 808 State [808 State]
Unchecked result 24. 24. wikistring 24. 24. [24.]


You can see some more examples in my testing page for the internal data types: WikiDB/Data type unit tests

The future[edit]

This format has been stable for quite some time, but you should always check the release notes when upgrading WikiDB to check that nothing has changed. Any changes that may affect your custom data handlers will always be documented on that page.