Last month, I was asked to investigate why content that a website owner thought was protected behind their website’s “Members Only” area was showing up on Google, much to their horror. There are several ways of fixing this problem, and I thought I’d share one with you.
In this particular case, the website’s “Members Only” area was being protected using the WP-Members plugin, although this problem would have happened even if using WordPress’ built-in “protected post” feature.
Child Pages
The Member’s Only page was corrected being “blocked” (WP-Members plugin terminology) – requiring a user to be logged in, but after that is where the problems started.
Although the Member’s Only page was correctly blocked, the pages deeper down into the site were not. I can only assume here, but it looks like the person setting up the pages assumed that since you couldn’t access these pages from anywhere on the site except for the protected page, that all was good. This is not the case. If you knew the direct URL to the sub-pages, you could get the content as an anonymous user. This also meant that the URL to these pages is published in sitemaps, which make it really easy for search engines, like Google, to find the content.
Lesson: Make sure all pages that should be hidden from public view are protected.
Attachments
Even if your page/post is protected, any files you upload to the media library and attach to the post are still accessible if the address is known. In this case, there were lots of PDF documents on the site, that were thought to only be accessible by “members”, since they were linked to from the pages that were in the Member’s Only area.
Blocking content from being accessed is done in WordPress through PHP code that checks which user you are, if you have the right permissions, etc. When your web browser is loading images, PDFs, videos, and other content that isn’t a web page, WordPress does not get involved in the process. The web server (Apache in this case) serves up the file without ever getting the PHP engine involved. So calling http://example.com/wp-content/2015/01/my_private_file.pdf will get delivered to anyone who asks for it. Scary, right?
The solution to this problem is to have WordPress act as an intermediary between the request for the file, and the file itself, allowing permissions to be verified in the process.
Enter Download Monitor, a handy plugin that will do exactly that, by doing two things:
- All files that are uploaded to the site using the Download Monitor custom content type are stored in a separate folder in WordPress (/wp-content/uploads/dlm_uploads/ to be exact). This folder contains an .htaccess file (instructions to the Apache web server) do not allow anyone access to the files in it (“deny from all” is the rule).
- Links in your posts/pages point to a new path, for example: /download/my_private_file/ that is managed by the Download Monitor plugin. It checks to see if the user has the correct permissions to access the file. If so, PHP retrieves the file on the server (because it is accessing the file directly on the hard drive, and not through a URL, the Apache .htaccess rules do not apply), and delivers the file to the browser as a download.
In addition to providing this protection of files that need to be restricted to specific users, it also gives you some additional features:
- The ability to see how many times a file has been downloaded, right in the WordPress dashboard.
- The ability to have different versions of the same file downloaded (think software, where you might want to make versions 1.0 and 1.1 downloadable)
Problem solved!
[…] Hooper explains one method to protect PDFs in a members-only area of a WordPress website. In addition to protecting the PDFs, I liked that Shawn’s solution tracks how many times the […]
I am a big fan of these updates! Interestingly enough, Thanks for the new feature and tip. It is much easier for visitors to download immediately after confirming and as well as by protecting pages.
Thanks Shawn! This is just what I’m looking for for a membership site I’m creating 🙂
Very useful article. Thanks a lot. I’ll have to prepare a members only blog soon and this is the info that I was looking for.
Thank you Shawn, this is exactly what I was looking for! Thanks for sharing. Do you know whether there is a possiblity to allow users to view the e.g. pdf-content in a new browser tab instead of starting the download immediately? Best regards, Evi
I read this post ages ago, Shawn … just came back to it for help on a client site with downloadable products. Thanks again, Shawn! I always knew Google indexed media files, but haven’t built many sites with restricted content so it didn’t matter.
Appreciate you sharing the problem, and the solution too.
Thanks for this post.
I only want to protect my files behind a Gravity Form – not necessarily a Member of the site.
Any ideas on how to protect the direct link to the PDF but not have the downloader be a *member* ?
Hey David,
You might be interested in our Prevent Direct Access plugin, which offers exactly what you need.
– Protect the direct link to your PDF
– Stop search engines like Google from indexing the file
– Allow anyone (they don’t necessarily have to become your website member) to download that file through a private URL
Hi Shawn, I appreciate you post but was curious about one other scenario. We have PDFs that we post for certain clients if they have the direct link from an email or webpage, but we don’t want them to show up just by Googling something that might be in the name of the PDF. Is there a way to make our Media Library unsearchable by search engines (but still work for anyone with the direct link)? Thanks!
Good question Olivia. If these PDFs are just something you’d rather not have shown, but aren’t confidential, it could possibly be as simple as blocking the uploads folder using robots.txt (instructing search engines not to index those folders). However, if the PDFs contain sensitive information, this might not be enough. I would look at storing this data in Amazon S3 private folders, and then providing your clients with links to the documents in those private folders. You could even have those direct links auto-expire after a specified timeframe if that’s something that would work for you.
Thanks Shawn! Blocking uploads folder should work. Do you mind walking me through how to do that or pointing me to site that explains it? Thank you!
You’d have to put a robots.txt file in the root level of your website (so it’s accessible at http://example.com/robots.txt) with the following lines:
User-agent: *
Disallow: /wp-content/uploads/*/*/*.pdf
This (I haven’t tested) should block all PDFs in the uploads folders from being added to search engines.
Thanks Shawn! We’ll look into that as an option.
Hi Shawn, Thanks for the above. I have a query on the solution you’ve detailed for protecting files. It mentions Apache server but what if the server is using something like nginx wouldn’t htaccess be unavailable? Could the same method be applied using nginx.conf?
Looking at the code, I can see the plugin will display a notice when it detects nginx, and tell you which rules you need to add to your configuration.
I am assuming this post is true for any kind of files not just PDF?
Correct.
thanks for the recommendation! Do you happen to know if the plugin would work with NGINX?
It does. You need to modify your nginx.conf in order to block direct file access. Looking at the code, I can see the plugin will display a notice when it detects nginx, and tell you which rules you need to add to your configuration.
Thank you so much! This helps!