WikiDB/Tables

From TestWiki
< WikiDB
Revision as of 12:28, 31 March 2019 by HappyDog (Talk | contribs) (wikidb_fielddata: Removed duplication and a bit more info.)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

There are three additional tables that WikiDB will add to the database in order to store the data required by the extension. In all cases, the data stored in these tables is derived from data elsewhere on your wiki, therefore it is always safe to delete and rebuild these tables, if necessary.

If you have any suggestions about how these tables could be further optimised, please discuss them on the talk page.



IMPORTANT NOTES - READ THESE BEFORE YOU CONTINUE!

  1. WikiDB comes with scripts to add these tables to the database, and to populate them with data (or to reset/re-populate them if necessary, e.g. because your configuration has changed). The scripts will also allow you to upgrade existing installations if the schema has changed. Therefore, so long as you have command-line access to run the maintenance scripts then you should never need to manually modify the database.
  2. The schema may change as the extension develops and changes may be incompatible with the current version. It may not be obvious how to change from one version to another, and if the incorrect version is used, the behaviour is undefined. Therefore, take care if you are unable to run the command-line scripts, and so need to modify the tables manually.
  3. If a new release makes changes to the schema, then the major version of the WikiDB software will be incremented, so it should be easy to tell if you need to update your schema. See WikiDB/Versioning for more details.
  4. The table definitions on this page are live, and are pulled from the DB that is powering this wiki. Any changes made to the live DB will automatically be reflected here (which is achieved using my Debugging tools extension). This has two implications. Occasionally I upgrade the wiki before a stable release is made available, for testing purposes. In this case the schema shown here may be more recent than the latest released version. Check Special:Version to see what version of WikiDB this wiki is running. Secondly, because these are updated live and are not shown in the page history, there is no easy way to view the schema for old versions of WikiDB. If this is a problem, you should check the SQL in ~/maintenance/sql/tables.sql, though this is not in a format that is ready to be run directly on your database.
  5. I cannot be held responsible for any problems or damage that may arise from using the software, deploying with the wrong schema or running the SQL on this page... standard disclaimer, etc. :)


wikidb_tables[edit]

Live Table Definition: wikidb_tables

CREATE TABLE IF NOT EXISTS `wikidb_tables` (
  `table_namespace` int(11) NOT NULL,
  `table_title` varchar(255) NOT NULL,
  `table_def` mediumblob NOT NULL,
  `redirect_namespace` int(11) DEFAULT NULL,
  `redirect_title` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`table_namespace`,`table_title`),
  KEY `wikidb_tables_redirect_ns_title` (`redirect_namespace`,`redirect_title`)
) ENGINE=InnoDB;

Contains information about all tables defined in the wiki.

Purpose:

  • To stop us having to parse the table's article text whenever data is requested.

Needs to be updated:

  • Whenever a page in a DB namespace is created/saved.
  • Whenever a page in a DB namespace is moved.
  • Whenever a page in a DB namespace is deleted/undeleted.
  • Whenever the $wgWikiDBNamespaces configuration setting is changed.

Considerations:

  • If you remove a table namespace redundant records will be left in this table. This may affect data display.
  • If you convert an existing namespace to a table namespace, this table will not be updated.
  • Both of the above issues can be resolved by running the UpdateTableNS maintenance script.


wikidb_rowdata[edit]

Live Table Definition: wikidb_rowdata

CREATE TABLE IF NOT EXISTS `wikidb_rowdata` (
  `row_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
  `page_namespace` int(11) NOT NULL,
  `page_title` varchar(255) NOT NULL,
  `table_namespace` int(11) NOT NULL,
  `table_title` varchar(255) NOT NULL,
  `parsed_data` mediumblob NOT NULL,
  `is_stale` tinyint(1) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`row_id`),
  KEY `wikidb_rowdata_table_ns_title` (`table_namespace`,`table_title`),
  KEY `wikidb_rowdata_ns_title` (`page_namespace`,`page_title`),
  KEY `wikidb_rowdata_is_stale` (`is_stale`)
) ENGINE=InnoDB;

Contains a record for each row in each database. These may defined anywhere in the wiki.

Purpose:

  • To avoid having to parse all articles whenever data is required.

Needs to be updated:

  • Whenever any page is created/saved.
  • Whenever any page is moved.
  • Whenever any page is deleted/undeleted.
  • If we decide to cache formatted versions of the field data, whenever wikidb_tables is updated.


wikidb_fielddata[edit]

Live Table Definition: wikidb_fielddata

CREATE TABLE IF NOT EXISTS `wikidb_fielddata` (
  `row_id` int(8) unsigned NOT NULL,
  `field_name` varchar(255) NOT NULL DEFAULT '',
  `field_value` varchar(255) NOT NULL DEFAULT '',
  KEY `wikidb_fielddata_row_id` (`row_id`),
  KEY `wikidb_fielddata_field_name` (`field_name`(75),`field_value`(75))
) ENGINE=InnoDB;

Contains a record for each field in each row in each database.

Purpose:

  • To be searchable and sortable.
  • Actual data for display is pulled from the row data.

Needs to be updated:

  • Whenever wikidb_rowdata is updated.
  • Whenever wikidb_tables is updated, as this affects the way we store the data in this table.
  • Whenever data types are added, removed or have their data-handling behaviour modified.

Considerations:

  • All values are in the same text field.
  • Sorting via this field should always give the correct order (whether the values are numeric or alphabetic).
    • This is handled by ensuring that all data is stored in a format that ensures an appropriate order when a text sort is used, including when a field contains more than one data type. For example, numeric values are padded with spaces, so 9.5 becomes  p        9.5        , where the string starts with a space (so numbers sort before strings) followed by a character to indicate whether it's positive or negative, and then uses spaces to ensure the decimal point always aligns.
    • An alternative approach, which will probably improve performance (at the expense of disk space) would be to have multiple columns, one per supported data type, and store the data in multiple representations.
    • A third option would be to CAST() or otherwise manipulate the stored data at the point the queries are run, but this feels like it would give too large a performance hit, would be harder to make portable across DB engines and may not be possible for all data types. However, it would remove the need for the deferred updates, so table changes have immediate effect.
  • When a table definition changes, all rows in the affected table are marked as stale and are regenerated slowly over time. This is to avoid performance issues when a high-usage table is modified. The RefreshStaleData maintenance script can be used to force an update more quickly.