PHP Upgrades with Rector
6/24/2023When I first arrived at my previous company, I was somewhat surprised to find their main monolith service was still running PHP 5.6. This ran the bulk of their business and while many efforts were underway to break it into services, years of effort hadn't been able to make much progress on fully eliminating "legacy."
Moving to services is an understandable goal but a difficult one. Many teams fall into the trap of dropping support for the old before the replacement is fully ready due to support costs (and maybe the ick factor of dealing with the older codebase). In this case, the growing complexity of services meant that the monolith was missing significant maintenance for far too long.
There were efforts to upgrade, of course. But they quickly ran into problems migrating many thousands of files originally written in PHP4 style code and dependencies. The model of many developers sitting at keyboards led to mistakes and long testing cycles. We didn't have the people, time or energy to pull off a manual upgrade.
Research
What was needed was an automated approach. One that could be tested once and then trusted to migrate code mistake-free. This is
also where my experience in other languages came in handy, I was thinking about codemods
on the front-end and how
they have enabled many open source frameworks to move quickly by providing a nice
upgrade path.
Initially I planned to write this myself. I found a great looking parser and then (through that) found the project of my dreams had all ready been written! Rector is an awesome project meant to do just that, help write upgrades for PHP code.
Next, I started examining automated reports of the compatibility issues in the current code base. csfixer
did
a complete job of reporting almost all the issues we would come across with newer PHP versions.
While running csfixer
and experimenting with Rector, I quickly found out that we'd have to do some initial cleanups
before the tools could run reliably. Our code base had some ancient parts that were a mix of ISO-8859-1 and
(occasionally) Windows line endings. I ran iconv
and dos2unix
over the repo.
I also found that the formatting was so bad in some areas that the parser would fail. I tried a number of automated formatters but wasn't happy with the output of most of them. They either weren't opinionated enough on whitespace issues (causing poor output ofter Rector) or sometimes produced broken code. In the end, the most reliable one I found was my IDE. It took a few hours but IntelliJ performed admirably in adverse conditions, with the help of a generous heap adjustment.
I shouldn't have been surprised. A ton of effort goes into Jetbrains' products to make them good and producing code just the way you want it. I already had settings matching our project standards, I just needed to take an extended coffee break while it worked magic.
Migrating While Each
Unfortunately, while Rector has migrations for a lot of things, we had thousands of generated files using the old
PHP each
function in loops and list destructuring assignments. Rector has migrations (called "rectors") for each
already but while experimenting with it, I found we had some particular needs not well covered.
I needed the migrated code to perform closely to the original
code. For example, some code was suppressing errors using @each
. The migrated code would have to also suppress
errors. There was simply too much code to consider other improvements on the initial pass. The resulting
code had to behave the same way.
I started with the original rector and was able to modify it to handle some additional cases.
For one, I added an is_array
check before loops with error suppression. At least for our data and code,
it ran the same way after migration. One of the unit tests looked like:
<?php
while (list(, $item) = @each($a)) {
echo("I'm a statement!");
}
?>
-----
<?php
if (@is_array($a)) {
foreach ($a as $item) {
echo("I'm a statement!");
}
}
?>
The relevant code was added after the rest of the rector has produced a $forEach
node:
if ($assign->expr instanceof ErrorSuppress &&
// don't need to check _SESSION
$assignExpr->args[0]->value->name !== '_SESSION') {
$if = new Node\Stmt\If_(
new ErrorSuppress(
$this->nodeFactory->createFuncCall('is_array',
[
$assignExpr->args[0]->value
]
)
),
["stmts" => [$forEach]]
);
$if->setAttributes($copyAttrs);
$node = $if;
}
Easy! It also had to migrate loops with additional conditions, like:
while ((list ($key, $val) = each ($client_nom_table)) && $var == 0) {
...
}
That became:
foreach ($client_nom_table as $key => $val) {
if (!($var == 0)) {
break;
}
...
}
It wasn't the prettiest thing but inverting and entire unknown expression could have become tricky, so the rector
simply added !()
to the expression in order to handle this stop condition.
A Year of Updates
That's pretty much how it went for a year. I decided early on to progress the code one point release at a time, since PHP point releases often add new deprecation warnings that need to be acted on. I also worried about overwhelming our logging infrastructure with notices and warnings.
Between other projects, I would update our staging and CI infrastructure and monitor it for errors. There were very few issues that came up due to migrations. There were a couple of cases where PHP's type system suddenly got more strict between point releases, but that was rare and easily handled during testing.
The first update from 5.6 to 7.0 dropped our platform's response times by 50%. That was amazing to see and a credit to all the PHP developers who worked so hard to improve the runtime. Each update continued adding performance improvements, but on the way to 7.4 we didn't see huge gains like that again. Unfortunately, my time with the company ended before we reached 8.0 and greater, or we may have seen another big drop.
After each release to the runtime we would follow up with an update to composer and libraries. We used Rector to update PHPUnit (which by itself was the source of many more annoying changes than the entire language).
Never done
This job is a common one and is never really done. PHP is releasing another update soon which will undoubtedly create more change and more code migrations. Automated updates are essential though, manually updating is possible but error-prone. Figuring out Rector was slow at first but became faster and allowed me to handle code migrations for things other than pure language updates—it was useful for our own code improvements, too.
It will be interesting to see if AI takes this job over in the future, though I suspect that it's a space where errors aren't acceptable. Squishy machine logic might not be reliable enough, the same way relying on squishy human logic didn't work.
Meanwhile, I'm back using languages that make better sense to me. On one hand, I agree with the belief that good software can be written using any language, but a bigger (larger) part believes it's more difficult in PHP. I'm glad to move on but PHP is improving a lot. Maybe the next time I run into it will be the time it sticks.
Doubtful.