SEO Black Holes

In the article http://tantek.com/2015/069/t1/js-dr-javascript-required-dead Tantek Çelik states that content that is not "curlable" is not on the web.

Tantek aslo encourages us to check it out for ourselves. So that's what this is.

And how do I do that?

So, how'd it go?

The line is searchable on Google. Unfortionately this site is not being indexed by Bing yet, I'll have to fix that before I can test it.

I have a few things I still want to test. Will the async attribute effect the search engines ability to index the site effectively?

Javascript files should never contain text content, as they do in the previous two tests. A more interesting case is how search engines index content added with XMLHttpRequest.

SEO is not everything though. What does the pages look to the Wayback Machene?

When I submitted this page on June 13, 2015 it looked like this. The Wayback Machene will cache all files, including the file requested through an XMLHttpRequest.

And then what?

Google has so far successfully indexed all three of my tests. Bing has yet to crawl this page, though I submitted the sitemap to them two weeks ago.

It is nice to see that adding content through some basic JavaScript actions or XMLHttpRequest won't hurt SEO in that it can't be crawled.

So far all actions have been completed before the window.onload event have fired. I have yet to test content that are the product of a user action too. We've just scratched the surface.

The document.DOMContentLoaded event could very well be triggered before any asynchronous scripts have had the chance to load. I'm sertain this line will be indexed without any problem, but lets add the test for good measure.

Lets make an XMLHttpRequest after the window.onload event.

Just like the document.DOMContentLoaded event can trigger before scripts with the async attriute are loaded so can the window.onload event. The tests regarding those two events are all synchronous.

Progressive enhancement is something I'm usually very strict about, lets throw that out the window.

Notice how the "Read more"-link doesn't have a title attribute. It doesn't have a working href. It's just awful. Go ahead and click it.

The Wayback Machene will cache, as far as I know, all resources on a page. I don't know when they started to do this, but it's something I'm very exited about. If you visit old archived sites the images are often missing, which is sad.

In this last test the Wayback Machene will save the script, that we know. The user action will result in an XMLHttpRequest loading more content. Will the Wayback Machene know how to save that content? Or have we reached the limit of what todays spiders can crawl? It's still a very commonly occurring thing. Click the link.

DuckDuckGo and Bing

DuckDuckGo and Bing have found their way over here. After the impressive performance of the Time Machene and Google I actually got a bit surprised when they didn't index my most basic test. Even when I at first didn't expect any of the spiders to do well.

In my first tests I included a short timeout. I didn't think a JavaScript file of a few bytes doing one single thing was a realistic case. And maybe a little bit because I wanted them to fail... For the benefit of DuckDuckGo and Bing I'll include two even more basic tests.

The same script inline instead of as an external resource. And without the self-executing function.

I did a new Time Machine test on Jun 30, 2015 of the user actions. It looks like the Time Machene wasn't able to crawl the second user action test at first. When I clicked the link on the cached page the script would request a not existing resource, the Time Machene would in turn request the same resouce from this site and then save it. This is at least what I think I saw when looking at the network tab in Chrome Developer Tools. And I can't think of any other way for it to have worked.

Google haven't crawled the last update of this page yet. It will be interesting to see how Google handles the same case.

Bad links don't work

Google did not manage to crawl the content from the "Read more"-links, as was expected. How can we set up a user interaction in a way that crawlers would understand?

To fix this we can use good old progressive enhancement. The next link will only return the requested fragment if loaded with XMLHttpRequest, or the whole page with the requested content included on a normal request.

This line is progressively enhanced.

I've learned a lot doing these tests. I've for example learned that Googles crawler is a lot more sophisticated than I thought. And that Google has support for something they call pretty AJAX URLs. Basically if the crawler encounters an URL containing #! it will replace that with _escaped_fragment_ and visit that page, #!stuff becomes ?_escaped_fragment_=stuff.

I've obviously never used it before, so lets try it out. Googles user guide for pretty AJAX URLs asks for a HTML snapshot of how the fully JavaScript rendered page looks like. But I don't care for it, I'll handle it like I did the progressively enhanced link.

Yandex

Something very exciting has happened, this page has been indexed by Yandex. The big question is, how many of these tests passes in Yandex? The naswer is: none. Not even the progressively enhanced link has been indexed.

In my progressively enhanced test the link is relative and it is also a search query. I wonder if that is two variables that hinderes Yandex. We could still create a simpler test I suppose.

To circle back to DuckDuckGo, it is doing quite fine these days. I'm not sure if it's as effective as Google but I will have to compare which tests they succeed and fail.

And finally, I really want to test Baidu also. I need to figure out how to get indexed by it.

Images

Text is fun an all, but an image says more than 1000 words. So what does your trivial little non-indexed sentence matter?

Images are however very heavy. We rely on lazy loading and responsive image loaders and whatever we need to fix the web weight problem. That's all very good. Can we do it without it hurting our SEO and, much more importantly, SEM.

So far only Google has indexed the images on the landingpage of pstenstrm.se. DuckDuckGo, Bing and Yandex are all falling short, I don't know if there's anything I can do to encourage them.

I have a few image tests planned, but I will start off simple. For all tests I use the same image of a sunken boat, in each image I have a word to identify it in search results. First off is a control image which is an regular <img> tag, an <meta property="og:image"/itemprop="image"> and an <img> added through simple javascript.

Responsive images

So far only Google have indexed the images in the last test. It has indexed all images except for the <meta itemprop/og:image> image. The images appear in search results if I search for the text in the alt attribute. If I only search for the address the only image that appears is the JS loaded image.

The only image that I was unsure of it would be indexed is the image inside a <noscript> element. It's really good to know that it's indexed, for when I develop responsive image solutions. Now, I know that Google will index images that are loaded through a responsive image solution. In imagesearch you will get results for the different sized images. I do not however expect other search engines to index images better than they do text. For those the <noscript> test is still the most interesting. For DuckDuckGo and Bing however the JS/Rresponsive image test is still very important.

In the fillowing test a image will load below 900px, another between 900px to 2000px and last one for screensizes above 2000px.

I still don't have anything indexed on Baidu. I'm trying to get access to its webmaster tools so I can submit this address. The simplest way to do that I've found is to register for Weibo, but for that I need a US phone number to recieve a verification text. There are other options of phone numbers, but I think getting a US number will be the easiest. It is however a bit harder to find a reliable and free option. And Weibo only allows 2 sent SMS messages during a period of time. So it's time consuming to find a proper service.

Node.js lightning talk

I just talked about this at Nordic.js, talk to me about it on twitter @perstenstrom.

Make it easier, then harder

I've thought about if I could come up with an even more basic test than my previous attempt. It really doesn't matter if any of these simple tests work, because if this is all your javascript did you really don't need javascript. This new test is the first paragraph of the page.

One of the most important challenges with building websites, beyond getting users to visit it, is performance. One of the biggest challenges with performance is images. Improving your image handing is one of the biggest and easiest performance gains you can get. One improvement is for example lazy loading images, it could make the initial load of your page a lot faster. Then when the user scrolls down the page images are loaded as they approach the visible screen. But do search engines scroll? Will the lazy loaded images be indexed?