Recently, I've launched what I saw web site, which was supposed to make sure, that all the content available via my mobile apps, is indexed by Google bot, and can be searched and found. Over time, the mobile app managed to accumulate close to 3k pieces of content. There is no particular theme to the posts -- it's really about anything and everything. As such, hopefully, the probability of someone stumbling upon the web site, which would result in downloading the mobile app, was high.
What I've built is a very simple Single Page App (SPA) written in reactJS. The technology pick was natural -- the corresponding mobile apps are written in ReactNative. My intention was to put to the test "learn once write everywhere" mantra behind the React. Every page contains an image, some comments, "next" and "prev" buttons to navigate around the feed, and links to the app stores which drive people to download the mobile apps. What can possibly go wrong with such a simple app, right? Well, actually, there are a lot of things that I've learned, which I wish I had known about before I started -- it would have saved me a lot of time and grief. In this post I will share some of my experiences. Hopefully it will safe some time for other developers, who are just about to launch the next killer app, which will change the world.
So, I just launch my app, and had absolutely nothing else to do, but to sit and wait for incoming traffic.
Google says it may take between 4 days to 4 weeks to crawl and index pages on your site. There is huge difference between 4 days and 4 weeks, I guess no one really knows how Google decides when your site will be crawled. In my case, I was visiting Google Search Console every days, few times per days, for at least 2 weeks. And finally, one day, Google started to report that it tried to crawl. Unfortunately, all the pages Goggle attempted to crawl failed with "404 not found".
It was a disaster -- I was already mentally preparing having to address scalability issues when I start getting a flood of visitors from the users who found my site organically. Unfortunately, two week's after launching the site -- nothing happened and I felt like I was cheated. That was the first time when I tried to use URL Inspection tool provided by Google Search Console. Indeed, except for the home page, every single URL on my site was returning 404, according to the URL Inspection tool. Although, when I went to the URL directly in the browser, it was showing me the right content and everything seemed to be working correctly.
What was going on? As usual, in case like this, I rolled up my sleeves and started digging around for potential solution. Most of the search results I found in relation to my issue were talking about how Google is incapable of crawling SPAs, and that the right solution was to write Isomorphic App, which would support Client as well as Server Side rendering. It was a bit unexpected. It looked like I had to learn yet another concept, pick one more framework, and, basically, rewrite my simple app one more time. I found something that looked promising -- a server side rendering NPM for React called Next. The more I read about Next, the more I liked it, the more I realized -- despite the simplicity, it would be a major effort for me to get it off the ground. I really didn't feel like doing another rewrite. When I was younger -- I would have just done that (the rewrite), but years of experience taught me that a good developer is the lazy one -- the one who spends more time on research, but finds the simplest possible solution at the end.
I did run into few links which talked about Google capable crawling SPAs. Not a whole a lot -- most still insisted that I need to support Server Side Rendering, but it was enough for me to challenge myself to find a solution that would not require a rewrite.
Next I turned to good old Chrome Development tools. When navigating to the entry point of my app https://www.wisaw.com I saw bunch of network requests, as expected, and at the end the page would render correctly. When I navigated around the site by clicking on the "Next" or "Prev" links -- everything seemed to work just as expected and no "404 not found" errors. However Google's URL Inspection tool kept insisting that every single page was returning 404. I double checked the URLs, even tried to copy paste them to minimize possibility of errors. Always the same result -- the page looks good in the browser, but Google URL inspection fails with 404. I really started to suspect, maybe this is a bug in Google, and I should wait for it to be fixed? And then, instead of navigating around the site by clicking on the links, I tried to go to a particular URL directly and analyze it with the Chrome Dev tools. I could not believe what I was seeing -- the first network request would fail with "404 not found", but then, somehow (still do not know how Chrome was able to do that), it would still download all the necessary JS, execute it, load and render the page I was requesting just fine.
It turns out that Google URL inspection tool was right all along. As such, next, I had to figure out why, when going to the URL directly, I get 404, but the browser still rendered it correctly. I continued learning more about how SPA app works, and the answer was on the surface -- SPA has just a single entry point, which is a home page URL, which loads all the Java Script, executes it, and from that point on, when navigating around by clicking Links on the page, it simply replaces DOM fragments, as well as manipulates browser URL and history, but never reloads the whole page. Kind of obvious, I knew all that while learning React, but it was a nice refresher.
Frankly, I was not able to find any meaningful explanation why the browser still rendered my URLs, even if it can't load necessary Java Script. My only guess is that some how it's able to use a cached version of JS, but I may be wrong (if anybody can explain it to me why and how it works -- please reach out and share).
The solution I found was a bit counter intuitive, but very simple. I had to configure my web server to always return 200, even when it should be 404. And, also, when it comes across the request for a resource that can not be found on the server, it would always return /index.html
As soon as I applied this change to the web server config -- voila, everything started to work as expected. The Google URL Inspection tool was happy. The pages were loading either when navigating to them via links or going to their URL directly, the Web Dev tool was not reporting 404 on initial page loads any more.
It's been almost a month since I launched my first SPA. The lesson learned -- if I would have tried Google URL Inspection (formally known as "Fetch as Google") as soon as I launch the app, and if I actually trusted it from the very beginning instead of trying to convince myself about potential bug in Google, I could have cut my wait time at least in half.
It's been almost 4 weeks now -- still waiting for flood of incoming requests so that I can finally start addressing the anticipated scalability issues...
Σχόλια