Joomla rewrites all the urls into https
I have been updating a old Joomla codebase (in web application terms old = ~five years). I was attempting to integrate some javascript and had an unexpected problem.
Something I find myself doing increasingly often is using javascript to integrate third party content into a website. In this case I have a blogger blog feed I was going to use some javascript similar to this to parse and display contents of the feed. When I ran it the feed kept showing up broken. I have use this sort of code a ton before, what was causing it to fail in this case?
I inspected, firebugged, and watched, the feed was being called, but I was getting an ssl related error. Wait a second, why is ssl even involved?
// set to be the URL of your RSS feed
var feedURL = 'http://blogname.blogspot.com/feeds/posts/default';
In my source that is the url, however, in the delivered content it looked like this:
// set to be the URL of your RSS feed
var feedURL = 'https://blogname.blogspot.com/feeds/posts/default';
Somewhere, something turned this url into https between what I had raw on the server and what was delivered to the client, automagically.
HTTPS Everything
Occasionally mixing together how things used to be done and how things are now done cause problems. It think it was fair a few years ago to look at the problem of "httpsing" the site and think that the best option was to enforce that at the application level. The average user was not going to fiddle with the webserver, and Joomla is supposed to be webserver agnostic, so delivering a blanket https everywhere in the application was the least worst option.
Now web application are built with the entire environment in mind and the webserver is part of that. I am an Nginx adherent. You would enforce the https requirements at the webserver level, redirecting traffic (80->443) or the like. That may not even be the terminal point, but a https proxy.
I slowly broke the ingrained habit that https is a thing to be used sparingly. Only where necessary and regular http for mostly everything. You do not see that being a consideration much anymore. Https used to be a drag on the serving hardware, but efficiencies in both software and hardware have largely eliminated this as an issue. Big sites now just https everything. This is done at the webserver level, if anyone comes in via http, redirect to https right away.
Before, when you did not do this, you were selective. Redirect to https if necessary. I have done this before, at the code level, for roughly the same reasons. The code was going off into uncertain hands and hosts, under no circumstances was this "page" to operate over anything but https.
In php it was roughly like this (there were much more library calls and logging in my real version, this is unfolded and simplified), and as simple as looking at the start of the url
php
$port = $_SERVER['SERVER_PORT'];
if($port === '80'){
//would have been a library call, modifying the url to start with https
$url = rewrite_url_prefix('http','https');
//another library call to set the header to refresh and exit
http_header('Refresh',"0; URL={$url}");
exit;
}
This was enforcement on the application level. I probably would not do it at the code level now, but delegate it to the webserver. It was the right decision at that time.
Back to the problem. Joomla is build around displaying user input. Yes it is a "Content Management Framework", but for the most part it is used as a simple website content management system. What looked to be the problem is "how do we secure all this user input?", their answer was "when the https everything toggle is set" rewrite the outgoing content so the urls begin with https. Cool, that is one way to do it. But wait, it was not just rewriting local website urls, it was rewriting all urls no matter where they point. This was the source of my codenundrum:
Joomla was rewriting all my urls, even to external websites in a javascript block to be https.
Now I knew the problem, so how to solve it?
Local https proxy for non http-only content
One of the things I could not do was disable the https everywhere. This was a large involved codebase that had been through multiple development firms. Doubtless there were parts of it that assumed https was in place. Even if not, it would be asking for trouble to disable a global setting like that on a codebase this gnarly. It was enabled for a reason, even if I did not know the reason.
What I needed to do was get that feed content in a https manner. I was not getting it from Blogger, they do not serve their feed content in https. I eventually settled on creating a proxy feed, which turned out to be rather trivial. Since I was already doing it might as well do it in json and pass the json down to make processing it on the client side easier.
It ends up looking something like this (edited down for clarity):
$data = file_get_contents("http://blogname.blogspot.com/feeds/posts/default?alt=json");
header('Content-type: application/json');
echo $data;
Say it was located on the server at "/blog-feed", now you would just rewrite the url in the original display javscript to use the proxy of the feed.
// set to be the URL of your RSS feed
var feedURL = '/blog-feed';
Now you get to keep the https rewriting on everything and still get your feed.