Saving Shortcode Data in Meta in WordPress

Shortcodes in WordPress can be very powerful, but there is one fundamental thing about them that isn’t so great – they keep their data relatively inaccessible by storing it in post content. This makes it difficult to do something like query for all posts that have a specific shortcode with a specific attribute, or to efficiently batch request API calls that several shortcodes in a single post might need to make.

One solution I’ve found for this is to save shortcode data (all shortcodes and their attributes) in a meta key whenever a post is created or updated. This allows you to simply inspect this meta key and do whatever you need to do with the data without having to parse the shortcodes with regex. Luckily there are several helper functions in WordPress core that will help you do this. The get_shortcode_regex() function will give you a regex pattern that will match all the shortcodes you have registered and the shortcode_parse_atts() function will extract all the attributes into an array. By combining these we can create a function that accepts a string of content and returns all the shortcodes and their attributes found in the content.

However, one big challenge with shortcodes is that they don’t natively support nesting the way HTML elements do. A string of shortcodes like [title text="Hello"][title text="World"][/title][/title] would fail to be parsed correctly because the first closing [/title] would close the first [title text="Hello"] shortcode, and for a string like [title text="Hello"][sub_title text="World"][/sub_title][/title] only the outer [title text="Hello"] shortcode would get expanded. This happens because the regex used by WordPress Core to expand shortcodes will only search one level deep in a single pass. Although you can add specific support for nesting shortcodes of different types by calling do_shortcode( $content ) when expanding the $content param that gets passed to your shortcode callback functions, the core regex for searching for shortcodes does have this nesting limitation.

The use case of nesting the same type of shortcode is probably always going to be a bad idea until WordPress Core supports it (unlikely), but nesting of different types of shortcodes is much less problematic and much more common (because two different closing tags like [/sub_title][/title] can be distinguished). An ideal shortcode parsing solution would therefore be able to extract shortcodes and their attributes even when they are nested like this, and ideally this parsing would be done as efficiently as possible (once per blob of content rather than once per shortcode).

After playing around with this problem for a while I’ve got a working solution that I’m ready to share. It comes in two pieces. One piece is a function that gets passed an array or string of content and does the actual shortcode data extracting, and the other is a function hooked on the save_post action that calls the shortcode data extraction function and passes the clean array of shortcode data to a hook. Passing the data to a hook is key, because it allows for all of the parsing to happen once and pick up all the data for all the registered shortcodes, then any other code that wants to do something special with the data from a particular shortcode can hook in and manipulate a specific meta key from there, or you can hook in there and just save all the data under a single meta key.

Here is what the two pieces look like:

<?php

add_action( 'save_post', 'xxx_extract_shortcode_atts', 10, 3 );
/**
 * Extract all shortcodes and their attributes, build a usable array with this
 * data, and pass this to a hook so other code can handle saving values in meta.
 *
 * @param  int      $post_id  The post ID being saved.
 * @param  WP_Post  $post     The post object being saved.
 * @param  bool     $update   Whether the post is being created or updated.
 */
function xxx_extract_shortcode_atts( $post_id, $post, $update ) {

	if ( 'auto-draft' === $post->post_status || 'revision' === $post->post_type ) {
		return;
	}

	$shortcode_regex = get_shortcode_regex();
	$shortcode_data  = xxx_recursively_parse_nested_shortcodes( $shortcode_regex, $post->post_content );

	/**
	 * Allow other code to save data from shortcodes in post content whenever
	 * a post is created or updated.
	 *
	 * @param  array    $shortcode_data  The array of shortcode data in the post.
	 * @param  int      $post_id         The post ID being saved.
	 * @param  WP_Post  $post            The post object being saved.
	 */
	do_action( 'xxx_save_post_shortcodes', $shortcode_data, $post_id, $post );
}

/**
 * Return an array of all shortcodes in $content and their attributes.
 *
 * This function will recurse to handle nested shortcodes. There are several tricky
 * things about the logic here, and this is mostly a result of the way Core handles
 * shortcode parsing (see $matches after the preg_match_all() call, for example), but
 * the basic strategy is that we take a pass for each level of shortcode nesting and
 * build an array of all the shortcodes and their attributes at that level, then we
 * recurse as needed, passing along the array we're building, until we do a pass where
 * no shortcodes are detected, then we return our built array.
 *
 * For the structure of $matches, see /wp-includes/shortcodes.php.
 *
 * @param   string        $regex     The regex to match all shortcodes.
 * @param   string|array  $content   The array resulting from preg_match_all().
 * @param   array         $existing  The tracking array of existing matches.
 *
 * @return  array                    The complete array of nested shortcodes.
 */
function xxx_recursively_parse_nested_shortcodes( $regex, $content, $existing = array() ) {

	// Maybe concatenate $content values from previous matching,
	// and support $content being a string or an array.
	if ( is_array( $content ) ) {
		$content = implode( ' ', $content );
	}

	// Search for shortcodes.
	$count = preg_match_all( "/$regex/", $content, $matches );

	// If we have shortcodes, extract their attributes and maybe recurse,
	// otherwise return the shortcodes we've already found.
	if ( $count ) {

		// Loop over each attribute string, extract attributes, and save them
		// in an array keyed by the shortcode name.
		foreach ( $matches[3] as $index => $attributes ) {

			// Create a key for the shortcode name if we haven't already.
			if ( empty( $existing[ $matches[2][ $index ] ] ) ) {
				$existing[ $matches[2][ $index ] ] = array();
			}

			// Parse shortcode atts.
			$shortcode_data = shortcode_parse_atts( $attributes );

			// Save the found attributes under the key.
			$existing[ $matches[2][ $index ] ][] = $shortcode_data;
		}

		// Recurse as needed.
		return xxx_recursively_parse_nested_shortcodes( $regex, $matches[5], $existing );

	} else {

		return $existing;
	}
}

This is admittedly a function that takes time to fully grok. The key to making it work is the recursion. Since the WordPress Core shortcode regex only parses one level deep on each pass, we need to keep recursing on the strings of content between shortcode opening and closing tags until we take a pass where we don’t find any more shortcodes, then we return the array that we keep building with each pass.

Given the following content string:

[heading level="2"]Best Heading Ever[/heading][blockquote style="wide"]Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla elit diam, dignissim sodales hendrerit eget, dictum vel lorem.[/blockquote][button text="Visit Site" url="https://example.com" color="blue"][/button]

[product id="1234"][button text="Add to Cart" url="https://example.com/cart/add?id=1234" color="green"][/button][/product]

[product id="5678"][button text="Add to Cart" url="https://example.com/cart/add?id=5678" color="green"][/button][/product]

The shortcode data array looks like this:

Array
(
    [heading] => Array
        (
            [0] => Array
                (
                    [level] => 2
                )

        )

    [blockquote] => Array
        (
            [0] => Array
                (
                    [style] => wide
                )

        )

    [button] => Array
        (
            [0] => Array
                (
                    [text] => Visit Site
                    [url] => https://example.com
                    [color] => blue
                )

            [1] => Array
                (
                    [text] => Add to Cart
                    [url] => https://example.com/cart/add?id=1234
                    [color] => green
                )

            [2] => Array
                (
                    [text] => Add to Cart
                    [url] => https://example.com/cart/add?id=5678
                    [color] => green
                )

        )

    [product] => Array
        (
            [0] => Array
                (
                    [id] => 1234
                )

            [1] => Array
                (
                    [id] => 5678
                )

        )
)

Here’s an example of using this to save product ids from shortcodes like [product id="1234"]:

<?php

add_action( 'xxx_save_post_shortcodes', 'xxx_save_product_ids_in_shortcodes', 10, 3 );
/**
 * Save product ids from [product] shortcodes in meta whenever a post is saved.
 *
 * @param  array    $shortcode_data  The array of shortcode data in the post.
 * @param  int      $post_id         The post ID being saved.
 * @param  WP_Post  $post            The post object being saved.
 */
function xxx_products_save_product_ids( $shortcode_data, $post_id, $post ) {

	$product_ids = array();

	// If we have any [product] shortcodes, loop over each instance
	// of them and build our array of product ids.
	if ( array_key_exists( 'product', $shortcode_data ) ) {
		foreach ( $shortcode_data[ $product_shortcode ] as $shortcode_atts ) {
			if ( ! empty( $shortcode_atts['id'] ) ) {
				$product_ids[] = $shortcode_atts['id'];
			}
		}
	}

	// Filter out duplicates.
	$product_ids = array_unique( $product_ids );

	// Sanitize.
	$product_ids = array_map( 'sanitize_text_field', $product_ids );

	// If we found any product ids, save them, otherwise delete previous meta values.
	if ( ! empty( $product_ids ) ) {
		update_post_meta( $post_id, '_xxx_products_in_content', $product_ids );
	} else {
		delete_post_meta( $post_id, '_xxx_products_in_content' );
	}
}

You could always save the entire array of shortcode data under a single meta key, but I usually find that I want separate meta keys for specific shortcodes. The beauty of passing the shortcode data to an action hook is that any other code can hook in and save whatever meta key it wants, with no additional overhead from additional regex passes.

Bonus

Thinking more about the nested shortcodes use case, what if you want to also know the parent shortcode of a specific shortcode instance? Knowing this would let you manipulate a meta key only when a specific shortcode is inside another shortcode. This can be accomplished by making a slight tweak to the original function from above:

<?php

/**
 * Return an array of all shortcodes in $content, their attributes, and their parent shortcodes.
 *
 * @param   string        $regex        The regex to match all shortcodes.
 * @param   string|array  $content      The array resulting from preg_match_all().
 * @param   array         $existing     The tracking array of existing matches.
 * @param   array         $parent_tags  The parent shortcodes from a previous pass.
 *
 * @return  array                       The complete array of nested shortcodes.
 */
function xxx_recursively_parse_nested_shortcodes( $regex, $content, $existing = array(), $parent_tags = array() ) {

	// Maybe concatenate $content values from previous matching,
	// and support $content being a string or an array.
	if ( is_array( $content ) ) {
		$content = implode( ' ', $content );
	}

	// Search for shortcodes.
	$count = preg_match_all( "/$regex/", $content, $matches );

	// If we have shortcodes, extract their attributes and maybe recurse,
	// otherwise return the shortcodes we've already found.
	if ( $count ) {

		// Reindex the array of $parent_tags.
		$parent_tags = array_values( $parent_tags );

		// Loop over each attribute string, extract attributes, and save them
		// in an array keyed by the shortcode name.
		foreach ( $matches[3] as $index => $attributes ) {

			// Create a key for the shortcode name if we haven't already.
			if ( empty( $existing[ $matches[2][ $index ] ] ) ) {
				$existing[ $matches[2][ $index ] ] = array();
			}

			// Parse shortcode atts.
			$shortcode_data = shortcode_parse_atts( $attributes );

			// If the shortcode has a parent shortcode, add an attribute to
			// indicate this.
			if ( ! empty( $parent_tags[ $index ] ) ) {

				$parent_tag = array(
					'parent_shortcode' => $parent_tags[ $index ],
				);

				$shortcode_data = array_merge( $shortcode_data, $parent_tag );
			}

			// Save the found attributes under the key.
			$existing[ $matches[2][ $index ] ][] = $shortcode_data;
		}

		// Set up parent tag matching for the next pass by removing any matches
		// that don't have child shortcodes.
		foreach ( $matches[5] as $index => $parent_content ) {

			$child_count = preg_match_all( "/$regex/", $parent_content, $child_matches );

			if ( ! $child_count ) {
				unset( $matches[2][ $index ] );
				unset( $matches[5][ $index ] );
			}
		}

		// Recurse as needed.
		return xxx_recursively_parse_nested_shortcodes( $regex, $matches[5], $existing, $matches[2] );

	} else {

		return $existing;
	}
}

This does introduce one extra regex pass for each nesting level into the function, so if you don’t need to know the parent shortcodes the simpler version will be faster, but this totally works. In the shortcode data array you get a parent_shortcode attribute on each shortcode instance that will contain the parent shortcode, if there was one.

Here’s what the same content string from above looks like when run through this version of the extraction function:

Array
(
    [heading] => Array
        (
            [0] => Array
                (
                    [level] => 2
                )

        )

    [blockquote] => Array
        (
            [0] => Array
                (
                    [style] => wide
                )

        )

    [button] => Array
        (
            [0] => Array
                (
                    [text] => Visit Site
                    [url] => https://example.com
                    [color] => blue
                )

            [1] => Array
                (
                    [text] => Add to Cart
                    [url] => https://example.com/cart/add?id=1234
                    [color] => green
                    [parent_shortcode] => product
                )

            [2] => Array
                (
                    [text] => Add to Cart
                    [url] => https://example.com/cart/add?id=5678
                    [color] => green
                    [parent_shortcode] => product
                )

        )

    [product] => Array
        (
            [0] => Array
                (
                    [id] => 1234
                )

            [1] => Array
                (
                    [id] => 5678
                )

        )
)

Pretty neat. Now we can see that the second and third instances of the button shortcode are inside product shortcodes.

I’m now using this system to save all kinds of meta and then directly accessing shortcode data through meta on the sites that I build. So far it’s been much more satisfying than relying on regex parsing every time I need this data, and I’ve only scratched the surface of what is possible with this approach.