5 1 1 1 1 1 1 1 1 1 1 Rating 5.00 (1 Vote)

You have heard more than once talk about the robots.txt file, or maybe you have only seen it on your website and you want to know what it is and why it is useful, but surely you are not very clear what it is or how it works, That I the first time I saw it, as is normal.

Google Bing Yahoo
Well, the file robot.txt is the file that read the search engines to know among other things where are the maps of the site of our website and which folders can or can not index or explore, since through it we can deny it or allow you to read Or not certain parts of our website.
 
Knowing this, you will realize the importance of having the robots.txt file well configured, so I created this article, to teach you in the simplest possible way how to create your own robots.txt file and thus control the accesses of the Robots of the search engines to the different parts of your web.
 
It is important that you have very clear that you modify things in this file, because considering its function if it is misconfigured, we could be denying access to the articles, which would be very serious, or allowing access to sites we should not, Well then I explain how to make your own robots.txt, I will accompany you with examples and right at the end I will leave mine, in case you want to use it base to create yours.
 
To create a good robots.txt file, we must create the file physically, and then define one of the most important options, the first is to indicate which parts of our web can or can not read, the configuration can be made different for each search engine or the like For all, as normal is to put the same for all, since you always have to move towards the universal formats, as it is the one we are going to describe here, for this we use the following syntax:
 


To indicate that the configuration that we are going to put later is for all the search engines, we put:

 
User-agent: *

To disable the access of the search engines to certain folders the syntax that we should use is the following:
 
Disallow: / Address Folder /

Example:

Disallow: / administrator /
 
The search engines do not index what is in the administrator folder.
 
On the contrary if what we want is to enable access, we have 2 options or do not put anything or use the following syntax:

Allow: / AddressFolder /
 
Example:

Allow / images /
 
This will be telling search engine spiders to index the images contained in the images directory.
 
And you will be asking yourself, that if you do not need to put anything to allow access to a folder, since it is enough just do not deny it, then why is there a specific rule to allow it?
 
Very easy, because it can happen for example that within the folder administrator to which we have denied access before, there is a folder that we want to have access to, in that case we can deny general access to that folder as we already have Done, and so that once the general access to the folder has been denied, we approve the access to a folder inside the administrator that in this case we will call photos so that you see as we should write it:
 
Example:

Disallow: / administrator /
Allow: / administrator / photos /
 
In this way search engines can only index the folder of all photos within the administrator, however if we had put the specific admission for the photos folder, the indexing of all the contents of the administrator folder including the photos folder would be blocked.
 
Last but not least, we must specify the address where our sitemap is located called sitemap.xml, so that the search engines find it and thus index our articles faster, just add the end of the robots file. Txt the following line:

Syntax:

Sitemap: http://www.dominio.com/sitemap.xml
 
In the case of my web the syntax would be:

Sitemap: http://www.windowslinuxymac.com/sitemap.xml
 
Keep in mind that the sitemap.xml file, can not exceed 10Mb or contain more than 50,000 different URL's. If this is your case divide the links into 2 or more sitemaps, and indicate it as follows in the robots.txt file.

Syntax:

Sitemap: http://www.dominio.com/sitemap.xml
Sitemap: http://www.dominio.com/sitemap2.xml
 
In the end, if you have done everything correctly should look something like this:

User-agent: *
Disallow: / administrator /
Disallow: / cache /
Disallow: / cli /
Disallow: / components /
Disallow: / images /
Disallow: / includes /
Disallow: / installation /
 
Sitemap: http://www.windowslinuxymac.com/sitemap.xml

Anyway, I hope that my article has helped you, I know that at first it may scare, but then it is simpler than it seems, especially considering that if your web is created with a CMS or content manager type Joomla or Wordpress will have a robots.txt file that you will only have to modify a bit, such as indicating the address of your site map, since none usually added by default, well, do not forget to comment your opinions and doubts, Thank you very much for your visit and I hope you return soon.


Escribir un comentario


Código de seguridad
Refescar