Difference between revisions of "WikiDB/Tables"

From TestWiki
Jump to: navigation, search
(wikidb_tables)
m (wikidb_fielddata: linkify)
 
(12 intermediate revisions by 2 users not shown)
Line 1: Line 1:
There are three new tables in the database that store the WikiDB data. If you have any suggestions about how these could be further optimised, please discuss them on the [[Talk:WikiDB/Tables|talk page]].
+
There are three additional tables that WikiDB will add to the database in order to store the data required by the extension. In all cases, the data stored in these tables is derived from data elsewhere on your wiki, therefore it is always safe to delete and rebuild these tables, if necessary.
 +
 
 +
If you have any suggestions about how these tables could be further optimised, please discuss them on the [[Talk:WikiDB/Tables|talk page]].
 +
 
 +
 
 +
----
 +
'''IMPORTANT NOTES - READ THESE BEFORE YOU CONTINUE!'''
 +
 
 +
# WikiDB comes with scripts to add these tables to the database, and to populate them with data (or to reset/re-populate them if necessary, e.g. because your configuration has changed).  The scripts will also allow you to upgrade existing installations if the schema has changed.  Therefore, so long as you have command-line access to run the [[WikiDB/Maintenance scripts|maintenance scripts]] then you should never need to manually modify the database.
 +
# The schema may change as the extension develops and changes may be incompatible with the current version.  It may not be obvious how to change from one version to another, and if the incorrect version is used, the behaviour is undefined.  Therefore, take care if you are unable to run the command-line scripts, and so need to modify the tables manually.
 +
# If a new release makes changes to the schema, then the major version of the WikiDB software will be incremented, so it should be easy to tell if you need to update your schema.  See [[WikiDB/Versioning]] for more details.
 +
# ''The table definitions on this page are live'', and are pulled from the DB that is powering this wiki.  Any changes made to the live DB will automatically be reflected here (which is achieved using my [[Debugging tools]] extension).  This has two implications.  Occasionally I upgrade the wiki before a stable release is made available, for testing purposes.  In this case the schema shown here may be more recent than the latest released version.  Check [[Special:Version]] to see what version of WikiDB this wiki is running.  Secondly, because these are updated live and are not shown in the page history, there is no easy way to view the schema for old versions of WikiDB.  If this is a problem, you should check the SQL in <code>~/maintenance/sql/tables.sql</code>, though this is not in a format that is ready to be run directly on your database.
 +
# I cannot be held responsible for any problems or damage that may arise from using the software, deploying with the wrong schema or running the SQL on this page... standard disclaimer, etc. :)
 +
 
 +
----
  
''(Note: The table definitions on this page are live.  Changes to the DB will automatically be reflected here.  This is achieved using my [[Debugging tools]] extension.)''
 
  
 
== wikidb_tables ==
 
== wikidb_tables ==
Line 9: Line 22:
  
 
Purpose:
 
Purpose:
* To stop us having to parse the article text whenever data is requested.
+
* To stop us having to parse the table's article text whenever data is requested.
  
 
Needs to be updated:
 
Needs to be updated:
Line 15: Line 28:
 
* Whenever a page in a DB namespace is moved.
 
* Whenever a page in a DB namespace is moved.
 
* Whenever a page in a DB namespace is deleted/undeleted.
 
* Whenever a page in a DB namespace is deleted/undeleted.
 +
* Whenever the [[WikiDB/Configuration settings#$wgWikiDBNamespaces|$wgWikiDBNamespaces configuration setting]] is changed.
  
 
Considerations:
 
Considerations:
 
* If you remove a table namespace redundant records will be left in this table.  This may affect data display.
 
* If you remove a table namespace redundant records will be left in this table.  This may affect data display.
* If you convert an existing namespace to a table namespace, this table will not be updated.  The workaround is to go to each page in the  namespace and to save a NULL edit.
+
* If you convert an existing namespace to a table namespace, this table will not be updated.
* A maintenance script needs to be written to fix-up both the above situations.
+
* Both of the above issues can be resolved by running the [[WikiDB/Maintenance scripts#UpdateTableNS.php|UpdateTableNS maintenance script]].
 +
 
  
 
== wikidb_rowdata ==
 
== wikidb_rowdata ==
Line 32: Line 47:
 
* Whenever any page is created/saved.
 
* Whenever any page is created/saved.
 
* Whenever any page is moved.
 
* Whenever any page is moved.
* Whenever any page is deleted.
+
* Whenever any page is deleted/undeleted.
* If we decide to cache formatted versions of the fields, whenever wikidb_tables is updated.
+
* If we decide to cache formatted versions of the field data, whenever wikidb_tables is updated.
  
  
Line 39: Line 54:
 
<describe_table table="wikidb_fielddata"></describe_table>
 
<describe_table table="wikidb_fielddata"></describe_table>
  
Contains a record for each field in each row in each database.
+
Contains a record for each field in each row in each database. Where a field has [[WikiDB/Defining data#Multi-value fields|multiple values within a single record]], there is a separate row for each value.
  
 
Purpose:
 
Purpose:
Line 47: Line 62:
 
Needs to be updated:
 
Needs to be updated:
 
* Whenever wikidb_rowdata is updated.
 
* Whenever wikidb_rowdata is updated.
* If we need to cache formatted versions of the fields (which we probably do), whenever wikidb_tables is updated.
+
* Whenever wikidb_tables is updated, as this affects the way we store the data in this table.
 +
* Whenever [[WikiDB/Tutorial: Creating your own data type|data types]] are added, removed or have their data-handling behaviour modified.
  
 
Considerations:
 
Considerations:
 
* All values are in the same text field.
 
* All values are in the same text field.
* Sorting via this field should always give the correct order (whether numeric or alphabetic).
+
* Sorting via this field should always give the correct order (whether the values are numeric or alphabetic).
** One solution is for numeric values to be stored zero-paddedE.g. 9.5 becomes 00000009.50000000
+
** This is handled by ensuring that all data is stored in a format that ensures an appropriate order when a text sort is used, including when a field contains more than one data typeFor example, numeric values are padded with spaces, so <code>9.5</code> becomes <code>&nbsp;p&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;9.5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code>, where the string starts with a space (so numbers sort before strings) followed by a character to indicate whether it's positive or negative, and then uses spaces to ensure the decimal point always aligns.
** Another solution is to cast the data in the query, based on the type of the field.  However this may not be possible for various reasons.
+
** An alternative approach, which will probably improve performance (at the expense of disk space) would be to have multiple columns, one per supported data type, and store the data in multiple representations.
 +
** A third option would be to CAST() or otherwise manipulate the stored data at the point the queries are run, but this feels like it would give too large a performance hit, would be harder to make portable across DB engines and may not be possible for all data types.  However, it would remove the need for the deferred updates, so table changes have immediate effect.
 +
* When a table definition changes, all rows in the affected table are marked as stale and are regenerated slowly over time.  This is to avoid performance issues when a high-usage table is modified.  The [[WikiDB/Maintenance scripts#RefreshStaleData.php|RefreshStaleData maintenance script]] can be used to force an update more quickly.

Latest revision as of 22:33, 7 February 2022

There are three additional tables that WikiDB will add to the database in order to store the data required by the extension. In all cases, the data stored in these tables is derived from data elsewhere on your wiki, therefore it is always safe to delete and rebuild these tables, if necessary.

If you have any suggestions about how these tables could be further optimised, please discuss them on the talk page.



IMPORTANT NOTES - READ THESE BEFORE YOU CONTINUE!

  1. WikiDB comes with scripts to add these tables to the database, and to populate them with data (or to reset/re-populate them if necessary, e.g. because your configuration has changed). The scripts will also allow you to upgrade existing installations if the schema has changed. Therefore, so long as you have command-line access to run the maintenance scripts then you should never need to manually modify the database.
  2. The schema may change as the extension develops and changes may be incompatible with the current version. It may not be obvious how to change from one version to another, and if the incorrect version is used, the behaviour is undefined. Therefore, take care if you are unable to run the command-line scripts, and so need to modify the tables manually.
  3. If a new release makes changes to the schema, then the major version of the WikiDB software will be incremented, so it should be easy to tell if you need to update your schema. See WikiDB/Versioning for more details.
  4. The table definitions on this page are live, and are pulled from the DB that is powering this wiki. Any changes made to the live DB will automatically be reflected here (which is achieved using my Debugging tools extension). This has two implications. Occasionally I upgrade the wiki before a stable release is made available, for testing purposes. In this case the schema shown here may be more recent than the latest released version. Check Special:Version to see what version of WikiDB this wiki is running. Secondly, because these are updated live and are not shown in the page history, there is no easy way to view the schema for old versions of WikiDB. If this is a problem, you should check the SQL in ~/maintenance/sql/tables.sql, though this is not in a format that is ready to be run directly on your database.
  5. I cannot be held responsible for any problems or damage that may arise from using the software, deploying with the wrong schema or running the SQL on this page... standard disclaimer, etc. :)


wikidb_tables[edit]

Live Table Definition: wikidb_tables

CREATE TABLE IF NOT EXISTS `wikidb_tables` (
  `table_namespace` int(11) NOT NULL,
  `table_title` varchar(255) NOT NULL,
  `table_def` mediumblob NOT NULL,
  `redirect_namespace` int(11) DEFAULT NULL,
  `redirect_title` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`table_namespace`,`table_title`),
  KEY `wikidb_tables_redirect_ns_title` (`redirect_namespace`,`redirect_title`)
) ENGINE=InnoDB;

Contains information about all tables defined in the wiki.

Purpose:

  • To stop us having to parse the table's article text whenever data is requested.

Needs to be updated:

  • Whenever a page in a DB namespace is created/saved.
  • Whenever a page in a DB namespace is moved.
  • Whenever a page in a DB namespace is deleted/undeleted.
  • Whenever the $wgWikiDBNamespaces configuration setting is changed.

Considerations:

  • If you remove a table namespace redundant records will be left in this table. This may affect data display.
  • If you convert an existing namespace to a table namespace, this table will not be updated.
  • Both of the above issues can be resolved by running the UpdateTableNS maintenance script.


wikidb_rowdata[edit]

Live Table Definition: wikidb_rowdata

CREATE TABLE IF NOT EXISTS `wikidb_rowdata` (
  `row_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
  `page_namespace` int(11) NOT NULL,
  `page_title` varchar(255) NOT NULL,
  `table_namespace` int(11) NOT NULL,
  `table_title` varchar(255) NOT NULL,
  `parsed_data` mediumblob NOT NULL,
  `is_stale` tinyint(1) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`row_id`),
  KEY `wikidb_rowdata_table_ns_title` (`table_namespace`,`table_title`),
  KEY `wikidb_rowdata_ns_title` (`page_namespace`,`page_title`),
  KEY `wikidb_rowdata_is_stale` (`is_stale`)
) ENGINE=InnoDB;

Contains a record for each row in each database. These may defined anywhere in the wiki.

Purpose:

  • To avoid having to parse all articles whenever data is required.

Needs to be updated:

  • Whenever any page is created/saved.
  • Whenever any page is moved.
  • Whenever any page is deleted/undeleted.
  • If we decide to cache formatted versions of the field data, whenever wikidb_tables is updated.


wikidb_fielddata[edit]

Live Table Definition: wikidb_fielddata

CREATE TABLE IF NOT EXISTS `wikidb_fielddata` (
  `row_id` int(8) unsigned NOT NULL,
  `field_name` varchar(255) NOT NULL DEFAULT '',
  `field_value` varchar(255) NOT NULL DEFAULT '',
  KEY `wikidb_fielddata_row_id` (`row_id`),
  KEY `wikidb_fielddata_field_name` (`field_name`(75),`field_value`(75))
) ENGINE=InnoDB;

Contains a record for each field in each row in each database. Where a field has multiple values within a single record, there is a separate row for each value.

Purpose:

  • To be searchable and sortable.
  • Actual data for display is pulled from the row data.

Needs to be updated:

  • Whenever wikidb_rowdata is updated.
  • Whenever wikidb_tables is updated, as this affects the way we store the data in this table.
  • Whenever data types are added, removed or have their data-handling behaviour modified.

Considerations:

  • All values are in the same text field.
  • Sorting via this field should always give the correct order (whether the values are numeric or alphabetic).
    • This is handled by ensuring that all data is stored in a format that ensures an appropriate order when a text sort is used, including when a field contains more than one data type. For example, numeric values are padded with spaces, so 9.5 becomes  p        9.5        , where the string starts with a space (so numbers sort before strings) followed by a character to indicate whether it's positive or negative, and then uses spaces to ensure the decimal point always aligns.
    • An alternative approach, which will probably improve performance (at the expense of disk space) would be to have multiple columns, one per supported data type, and store the data in multiple representations.
    • A third option would be to CAST() or otherwise manipulate the stored data at the point the queries are run, but this feels like it would give too large a performance hit, would be harder to make portable across DB engines and may not be possible for all data types. However, it would remove the need for the deferred updates, so table changes have immediate effect.
  • When a table definition changes, all rows in the affected table are marked as stale and are regenerated slowly over time. This is to avoid performance issues when a high-usage table is modified. The RefreshStaleData maintenance script can be used to force an update more quickly.