Some parts of Magento 2 are oriented towards flexibility, others seem to have been designed to cause headaches. One of those areas that causes me to frown is the Magento 2 robots router. Here's why.
The goal: Creating a
This story starts very simple: Let's create a
robots.txt output when the URL
/robots.txt is requested. While you might expect this to be accomplished via a simple file in the
pub/ folder, the
Magento_Robots module actually takes a different route (pun intended). Here we go.
Stage 1: A router
The first step here is to realize that the file
robots.txt does not exist by default in the
pub/ folder. And this causes a HTTP request towards the URL
/robots.txt to be caught by Magento (either with an Apache
.htaccess rule or with a Nginx location match), so that Magento is able to determine the right output. So far so good.
With Magento its default router, any URL is matched with the pattern (
frontName/controllerPath/actionClass) so that a URL
checkout/index/index ends up with the
Magento_Checkout module. There is no module with frontname
robots.txt, so this request would normally die.
Magento_Robots module also adds its own router (via
etc/frontend/di.xml) which intercepts any request for
robots.txt and forwards it to the path
robots/index/index, which ends up to be caught be the action class (aka controller)
Stage 2: An action
The action class
\Magento\Robots\Controller\Index\Index is quite simple: It creates a result page object with the layout handle
robots_index_index. And note that the
Content-Type is set to
text/plain. We're generating a page based on text, not HTML.
The result page calls upon the layout.
Stage 3: The layout
The layout handle
robots_index_index resembles an XML layout file
robots_index_index.xml. But instead of extending the normal
default layout, the file calls upon the XML page layout
view/frontend/page_layout/robots.xml). This makes sure that all regular containers and blocks are gone, with only container remaining:
root container is then filled with content from a block class
Stage 4: The block
The block class
Magento\Robots\Block\Data renders non-HTML output via its
_toHtml() method. Right. Luckily it is not extending upon the template class. Perhaps one of the reasons for this approach is caching, because if then page cache is enabled, the
/robots.txt page will be cached as well.
The actual output is actually retrieved from a model
Stage 5: The model
The model class
\Magento\Robots\Model\Robots retrieves a value from the configuration path
design/search_engine_robots/custom_instructions which could be entered with some value. In my case, it is empty.
My head hurts
If you have followed along, you can see that this entire functionality is quite complex, assuming that the actual output was not dynamically created, but instead manually saved to the configuration table. Even worse, in my own case (my development environment, ok) it is empty. Which means that the entire architecture was meant to deliver an empty output.
And this took 152ms. Imagine all this CPU power across all of these Magento shops in production, and we have just burned down part of the Amazon forest.
Improvement: Create a file instead
One simple improvement is to throw away the entire module and simply add a file
pub/robots.txt to your Magento files and be done with it. This has 2 benefits: First of all, you'll understand things a lot better. Second, the request speed goes up, because any kind of page handling (Magento without FPC, Magento with FPC or Varnish) is replaced with a simple file request (5ms at most).
But then we are lacking the functionality of overriding this value per Website scope.
Improvement: Create a better router instead
Another way might be to create a new router instead. Let that router inject itself with the configuration and simply output that same configuration value when requested. Gone is the action, layout, block and model - and the page speed will still be ok-ish. I'm not sure about the cacheability of this, but at least things are less complex.
Anyway, I find this a good example of how Magento can sometimes be overengineered.