SQL Engine Examples
Mastodon Mention Detection
The most complete real-world example: detect Mastodon's mention microformat (a <span class="h-card"> wrapping an <a class="u-url mention">) and collapse it into a single mention feature.
import { from } from 'relational-text'
import { applyFacetSQL } from 'relational-text/sql'
const html = `
<p>Hello
<span class="h-card">
<a href="https://mastodon.social/@alice" class="u-url mention">@alice</a>
</span>!
</p>
`
const doc = from('html', html)
const detected = applyFacetSQL(doc, `
-- Synthesize a mention feature by joining the h-card span and its inner anchor
INSERT INTO features (byte_start, byte_end, type_id, name, data)
SELECT
span.byte_start,
span.byte_end,
'org.joinmastodon.facet',
'mention',
json_object(
'href', json_get(a.data, 'href'),
'handle', ltrim(text_at(span.byte_start, span.byte_end), '@')
)
FROM features AS span
JOIN features AS a
ON a.byte_start = span.byte_start
AND a.byte_end = span.byte_end
WHERE span.name = 'span' AND class_contains(span.data, 'h-card')
AND a.name = 'a' AND class_contains(a.data, 'u-url');
-- Remove the now-redundant h-card span and u-url anchor
DELETE FROM features WHERE name = 'span' AND class_contains(data, 'h-card');
DELETE FROM features WHERE name = 'a' AND class_contains(data, 'u-url');
`)How this works:
FROM features AS span— scan all features, aliasing each row asspanJOIN features AS a ON a.byte_start = span.byte_start AND a.byte_end = span.byte_end— find pairs of features that cover the exact same byte range (the<span>wrapper and the<a>link share the same range because the anchor fills the span)WHERE span.name = 'span' AND class_contains(span.data, 'h-card')— the outer element must be a<span class="h-card">(Mastodon's microformat for user cards)AND a.name = 'a' AND class_contains(a.data, 'u-url')— the inner element must be an<a class="u-url mention">(the actual mention link)json_object('href', json_get(a.data, 'href'), 'handle', ltrim(...))— build the output feature's attributes: extract thehreffrom the link, and derive thehandleby stripping the leading@from the visible textINSERT INTO features— create a neworg.joinmastodon.facet#mentionfeature with these combined attributes- The two
DELETEstatements — remove the originalspanandafeatures since they've been collapsed into the singlementionfeature
The result has a single org.joinmastodon.facet#mention feature with href and handle attributes, replacing the two-element HTML structure.
Class-Based Feature Mapping
Map <span class="highlight"> to a custom highlight mark:
const result = applyFacetSQL(doc, `
INSERT INTO features (byte_start, byte_end, type_id, name, data)
SELECT byte_start, byte_end, 'com.example.facet', 'highlight', '{}'
FROM features
WHERE name = 'span' AND class_contains(data, 'highlight');
DELETE FROM features
WHERE name = 'span' AND class_contains(data, 'highlight');
`)How this works:
SELECT ... FROM features WHERE name = 'span' AND class_contains(data, 'highlight')— find all<span>elements with CSS classhighlightINSERT INTO features ... 'com.example.facet', 'highlight', '{}'— create a new feature in a custom namespace for each match, with no attributesDELETE FROM features WHERE ...— remove the original HTML spans so the document has clean semantic features instead of presentation markup
Renaming a Namespace
Migrate all features from one namespace to another:
const result = applyFacetSQL(doc, `
UPDATE features
SET type_id = replace(type_id, 'org.w3c.html.facet', 'com.example.html.facet')
WHERE type_id LIKE 'org.w3c.html.facet%'
`)This uses SQL's replace() function to rewrite the type_id string for all matching features in one statement.
Text-Derived Attributes
Extract the displayed text as an attribute (e.g., for mention normalization):
const result = applyFacetSQL(doc, `
UPDATE features
SET data = json_object(
'handle', ltrim(text_at(byte_start, byte_end), '@'),
'href', json_get(data, 'href')
)
WHERE name = 'mention'
`)This rebuilds the feature's data JSON from scratch: it keeps the existing href and adds a handle derived from the document text (with the leading @ stripped via ltrim).
Splitting a Feature by Attribute
Convert headings stored as a single heading feature with a level attribute into h1/h2/... features using CASE:
UPDATE features
SET name = CASE json_get(data, 'level')
WHEN '1' THEN 'h1'
WHEN '2' THEN 'h2'
WHEN '3' THEN 'h3'
WHEN '4' THEN 'h4'
WHEN '5' THEN 'h5'
WHEN '6' THEN 'h6'
ELSE 'h1'
END,
data = json_object()
WHERE name = 'heading'How this works:
WHERE name = 'heading'— target all heading features (which store their level in an attribute rather than their name)CASE json_get(data, 'level')— read thelevelattribute from each heading's data JSONWHEN '1' THEN 'h1'— map each level value to the corresponding HTML-style namedata = json_object()— replace the data with an empty object since the level information is now encoded in the feature name
Dropping Unknown Features
Remove all features from a namespace that are not in an allowlist:
DELETE FROM features
WHERE type_id = 'org.w3c.html.facet'
AND name NOT IN ('strong', 'em', 'code', 'a', 'p', 'h1', 'h2', 'h3')Any feature whose name is not in the allowlist is deleted. This is useful for sanitizing documents to a known-safe subset of formatting.
Reading Feature Data
Query features to inspect a document's structure:
import { from, Document } from 'relational-text'
import { applyFacetSQL } from 'relational-text/sql'
// applyFacetSQL returns a Document — use toJSON() to inspect
const doc = from('markdown', '## Hello\n\nParagraph with **bold**.')
const result = applyFacetSQL(doc, `
SELECT byte_start, byte_end, type_id, name, data FROM features
`)
// The result document has the same features — use doc.toJSON().facets to inspect
console.log(doc.toJSON().facets)TIP
applyFacetSQL always returns a Document. To read query results, inspect the returned document's facets array via doc.toJSON().facets. SELECT statements that don't modify the table return the document unchanged.