What is Robots.txt

robots.txt is a file placed at the root of a website requesting web crawlers not index parts of a website. This does not enforce this in anyway but is a defacto standard that most reputable search engines and web crawlers will respect.

How to add to Jekyll

For my proposes I don’t need to restrict any parts of my website so I will be allowing all.

To add this to a Jekyll site is quite easy.

  • First create a robots.txt file in the root of your site.
  • In this file we will add the following
    User-agent: *
  • lastly we will rebuild the site.

root@ubuntu-512mb-sfo1-01:~/www# vim robots.txt
root@ubuntu-512mb-sfo1-01:~/www# ls
404.html  about  assets  cd  certbot.log  _config.yml  feed.xml  Gemfile  Gemfile.lock  _includes  index.md  jekyll  _layouts  LICENSE  _posts  README.md  robots.txt  _sass  _site  sript  thumbnails
root@ubuntu-512mb-sfo1-01:~/www# jekyll build
Configuration file: /root/www/_config.yml
            Source: /root/www
       Destination: /root/www/_site
 Incremental build: disabled. Enable with --incremental
                    done in 2.593 seconds.
 Auto-regeneration: disabled. Use --watch to enable.
root@ubuntu-512mb-sfo1-01:~/www# rm -rf /var/www/html/*
root@ubuntu-512mb-sfo1-01:~/www# cp -rf ~/www/_site/* /var/www/html/
  • And to test we can just curl the site
root@ubuntu-512mb-sfo1-01:~/www# curl https://invoke.coffee/robots.txt -v
*   Trying 2604:a880:1:20::3085:5001...
* Connected to invoke.coffee (2604:a880:1:20::3085:5001) port 443 (#0)
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: CN=www.invoke.coffee
*  start date: Oct  4 11:07:31 2017 GMT
*  expire date: Jan  2 11:07:31 2018 GMT
*  subjectAltName: host "invoke.coffee" matched cert's "invoke.coffee"
*  issuer: C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3
*  SSL certificate verify ok.
> GET /robots.txt HTTP/1.1
> Host: invoke.coffee
> User-Agent: curl/7.52.1
> Accept: */*
< HTTP/1.1 200 OK
< Server: nginx/1.10.3 (Ubuntu)
< Date: Sat, 14 Oct 2017 05:17:05 GMT
< Content-Type: text/plain
< Content-Length: 24
< Last-Modified: Sat, 14 Oct 2017 05:10:20 GMT
< Connection: keep-alive
< ETag: "59e19c3c-18"
< Accept-Ranges: bytes
User-agent: *
* Curl_http_done: called premature == 0
* Connection #0 to host invoke.coffee left intact