UPCOMING: Magento 2 Bootcamp of four days in The Netherlands (April 29th - May 2nd)
background

August 9, 2022

Magento 2 robots router horror

Yireo Blog Post

Some parts of Magento 2 are oriented towards flexibility, others seem to have been designed to cause headaches. One of those areas that causes me to frown is the Magento 2 robots router. Here's why.

The goal: Creating a robots.txt output

This story starts very simple: Let's create a robots.txt output when the URL /robots.txt is requested. While you might expect this to be accomplished via a simple file in the pub/ folder, the Magento_Robots module actually takes a different route (pun intended). Here we go.

Stage 1: A router

The first step here is to realize that the file robots.txt does not exist by default in the pub/ folder. And this causes a HTTP request towards the URL /robots.txt to be caught by Magento (either with an Apache .htaccess rule or with a Nginx location match), so that Magento is able to determine the right output. So far so good.

With Magento its default router, any URL is matched with the pattern (frontName/controllerPath/actionClass) so that a URL checkout/index/index ends up with the Magento_Checkout module. There is no module with frontname robots.txt, so this request would normally die.

However, the Magento_Robots module also adds its own router (via etc/frontend/di.xml) which intercepts any request for robots.txt and forwards it to the path robots/index/index, which ends up to be caught be the action class (aka controller) \Magento\Robots\Controller\Index\Index.

Stage 2: An action

The action class \Magento\Robots\Controller\Index\Index is quite simple: It creates a result page object with the layout handle robots_index_index. And note that the Content-Type is set to text/plain. We're generating a page based on text, not HTML.

The result page calls upon the layout.

Stage 3: The layout

The layout handle robots_index_index resembles an XML layout file robots_index_index.xml. But instead of extending the normal default layout, the file calls upon the XML page layout robots (view/frontend/page_layout/robots.xml). This makes sure that all regular containers and blocks are gone, with only container remaining: root.

This root container is then filled with content from a block class Magento\Robots\Block\Data.

Stage 4: The block

The block class Magento\Robots\Block\Data renders non-HTML output via its _toHtml() method. Right. Luckily it is not extending upon the template class. Perhaps one of the reasons for this approach is caching, because if then page cache is enabled, the /robots.txt page will be cached as well.

The actual output is actually retrieved from a model \Magento\Robots\Model\Robots.

Stage 5: The model

The model class \Magento\Robots\Model\Robots retrieves a value from the configuration path design/search_engine_robots/custom_instructions which could be entered with some value. In my case, it is empty.

My head hurts

If you have followed along, you can see that this entire functionality is quite complex, assuming that the actual output was not dynamically created, but instead manually saved to the configuration table. Even worse, in my own case (my development environment, ok) it is empty. Which means that the entire architecture was meant to deliver an empty output.

And this took 152ms. Imagine all this CPU power across all of these Magento shops in production, and we have just burned down part of the Amazon forest.

Improvement: Create a file instead

One simple improvement is to throw away the entire module and simply add a file pub/robots.txt to your Magento files and be done with it. This has 2 benefits: First of all, you'll understand things a lot better. Second, the request speed goes up, because any kind of page handling (Magento without FPC, Magento with FPC or Varnish) is replaced with a simple file request (5ms at most).

But then we are lacking the functionality of overriding this value per Website scope.

Improvement: Create a better router instead

Another way might be to create a new router instead. Let that router inject itself with the configuration and simply output that same configuration value when requested. Gone is the action, layout, block and model - and the page speed will still be ok-ish. I'm not sure about the cacheability of this, but at least things are less complex.

Anyway, I find this a good example of how Magento can sometimes be overengineered.

Posted on August 9, 2022

About the author

Author Jisse Reitsma

Jisse Reitsma is the founder of Yireo, extension developer, developer trainer and 3x Magento Master. His passion is for technology and open source. And he loves talking as well.

Sponsor Yireo

Looking for a training in-house?

Let's get to it!

We don't write too commercial stuff, we focus on the technology (which we love) and we regularly come up with innovative solutions. Via our newsletter, you can keep yourself up to date on all of this coolness. Subscribing only takes seconds.

Do not miss out on what we say

This will be the most interesting spam you have ever read

We don't write too commercial stuff, we focus on the technology (which we love) and we regularly come up with innovative solutions. Via our newsletter, you can keep yourself up to date on all of this coolness. Subscribing only takes seconds.