Attribute Syntax & Parsing
How inline attributes on the opening tag line are parsed and merged by Docara's Attrs helper.
Quick summary
- Supported forms on the open line:
- Key-value pairs:
key="value",key:'value',key=value - Shorthands:
.class(append toclass),#id(setid)
- Key-value pairs:
- Classes are concatenated and deduplicated; other keys override by last write wins.
- Parsing assumes ASCII quotes (
"') and standard spaces. See Unicode notes if your content uses smart quotes/NBSP.
API
Attrs::parseOpenLine(?string $attrStr): array
Parses the substring after !type on the opening line and returns an associative array of attributes.
Behavior (current implementation):
- Trim the string; return
[]when empty. - Initialize an empty class list; set
$attrs['class']temporarily. - Collect all
key=valuepairs using the regex:/(\w[\w:-]*)\s*=\s*(?:"([^"]*)"|'([^']*)'|(\S+))/- Key: letters/digits/underscore, then any of
[\w:-](ASCII only) - Value: either double-quoted, single-quoted, or an unquoted token (no whitespace or `"'=<>``)
- Special case: for
key === 'class'values are split by whitespace and appended to the class list
- Key: letters/digits/underscore, then any of
- Remove all matched
key=valuepairs from the string. - Parse shorthands with:
/([.#])([\w:-]+)/.appends to the class list,#setsid
- If the class list is non-empty, set
$attrs['class']to the deduplicated whitespace-joined list; otherwise remove the temporaryclasskey. - Return
$attrs.
Current regexes are ASCII-centric (no
umodifier). If you expect Unicode letters/spaces/quotes, see Unicode notes below for a recommended enhancement.
Attrs::merge(array ...$sets): array
Merges multiple attribute maps into a new array.
- Iterates left-to-right; for non-
classkeys, later values override earlier ones. - For
class, splits each value by whitespace, concatenates all classes, deduplicates, and rejoins with a single space. - Returns the combined map. (If no classes present,
classis omitted.)
Example:
Attrs::merge(
['class' => 'a b', 'id' => 'x'],
['class' => 'b c', 'data-x' => '42'],
);
// => ['id' => 'x', 'data-x' => '42', 'class' => 'a b c']
Examples
1) Quoted value with spaces
Open line
!example class:"mb-4 border" data-x=42
Parsed attrs
['data-x' => '42', 'class' => 'mb-4 border']
2) Shorthands and id
Open line
!example .card .shadow #hero
Parsed attrs
['id' => 'hero', 'class' => 'card shadow']
3) Merge with base attributes
Base
['class' => 'example overflow-hidden']
Inline
['class' => 'mb-4 overflow-hidden', 'data-x' => '1']
Result of Attrs::merge(base, inline)
['data-x' => '1', 'class' => 'example overflow-hidden mb-4']
Note how
overflow-hiddenis deduplicated.
4) Duplicate id (last wins)
Attrs::merge(['id' => 'a'], ['id' => 'b']); // ['id' => 'b']
Unicode notes (smart quotes, NBSP, non-ASCII keys)
The stock regexes in Attrs::parseOpenLine() use ASCII classes (e.g., \w) and no u modifier, which means:
- Smart quotes (e.g., curly quotes) won't match the quoted branches.
- Non-breaking spaces (NBSP
\xC2\xA0) won't be treated as whitespace. - Keys with non-ASCII letters won't match
\w.
Authoring guideline: use plain ASCII quotes and spaces in the opening line, e.g., class:"a b".
Recommended enhancement (optional):
- Normalize smart quotes/NBSP to ASCII before parsing, or
- Switch to Unicode-aware regexes (add the
umodifier and use\p{L}/\p{N}), e.g.:
// Pre-normalize
$attrStr = str_replace(
["\\xC2\\xA0", "\\u201c", "\\u201d", "\\u2018", "\\u2019"],
[' ', '"', '"', "'", "'"],
$attrStr
);
// Unicode-aware patterns
if (preg_match_all('/([\p{L}\p{N}_][\p{L}\p{N}_:-]*)\s*=\s*(?:"([^"]*)"|' .
"([^']*)" .
'|([^\s"\'=<>`]+))/u', $attrStr, $m, PREG_SET_ORDER)) {
// ...
}
If you adopt the Unicode version, update both the key=value and shorthand patterns, and prefer preg_split('/\s+/u', ...) when splitting classes.
Security & validation
Attrsdoes not escape values; escaping happens at render time (HtmlElementor your custom renderer). Always escape any value you concatenate manually.- To prevent unwanted attributes (e.g.,
onclick), implement a per-tagattrsFilter(array $attrs, array $meta): arrayand whitelist allowed keys.
Edge cases & tips
- Boolean flags are not parsed; prefer
flag="true"or map presence inattrsFilter(). - Repeated shorthand
#id- the last value wins afterAttrs::merge. - Extra text after the closing marker is ignored by the parser; attributes must be on the opening line.
- Empty
classentries are filtered out; no trailing spaces in the result.
Tests you should have
- Quoted/unquoted values; spaces inside quotes.
- Multiple classes with duplicates - deduped.
.class+class:"..."together - merged predictably.#idoverrides baseidviamerge().- (If enabled) Unicode quotes/NBSP normalization.
Authoring cheatsheet
- Add classes:
.box .roundedorclass:"box rounded" - Set id:
#anchor - Key-value with spaces:
title:"Complex value here" - Data attributes:
data-x=42 data-name:'Alice'