chunk diffing, shell scripts, and pain

Intro

A recommended song for reading this article is Forgotten Friend by Mammal Hands, as it's what I listened to during a large portion of the events described.

I play a Minecraft modpack with a close friend of mine, Shea. The modpack in specific is Better MC 5 NeoForge, which is on the Minecraft version 1.21.1. We started playing this modpack together approximately 5 months ago, around Christmas time, and have played it frequently since. When we started, the modpack version was v45(.0). Also, the server we played on was on a VPS located in ██████.

Over time, we would gradually see the version number on an update alert increase, 46, 47, 48... Eventually, Shea and I agreed that I would update the modpack once version 50 was released—I was hesitant to do so due to a fear of potential issues arising from updating the modpack, as there is and was no built in server-side updating mechanism (though updating the clients is like, four clicks and forty seconds in prism launcher).

Around a week ago, I had nothing to do while on a call with Shea, and I decided to finally update the server, as version 50 had released. My worries would be proven valid.

Updating

Updating itself wasn't that difficult, nor scary, as I take backups every 72 hours using Backrest (though I took another simple backup of the modpack folder with the copy command, just in case). I simply shut down the server, downloaded the most recent version with wget, unzipped, copied over the mods and config folders from the newest version, and turned on the server. There were no problems in the slightest, except for one catastrophic issue:

Every single pale oak thing—a type of block used in almost every single one of our builds—in our world was gone. What possibly could have gone wrong?

Naturally, I looked at the logs, and found out why almost immediately:

ResourceKey[minecraft:root / minecraft:block]: vanillabackport:pale_oak_trapdoor -> using default);

This was repeated for every pale oak thing in the world, only one log message is shown here for brevity.

Essentially, because Vanilla Backport no longer had the pale oak blocks, and the pale oak blocks were all under the vanillabackport namespace, the game could not find the pale oak blocks, so they were all transformed into air. To this day, I'm not sure why this happened: the modpack was still on 1.21.1, and pale oak trees were only added in a 1.21.2 snapshot, but somehow the pale oak trees were moved to the minecraft namespace. All I could find was this blurb in the v48 changelog for BMC5:

Added all content from Minecraft versions 1.21.2 - 1.21.5, alongside hundreds of bug fixes and improvements

How were we playing with pale oak planks on v45???

Well, now what? I still had a backup of the world at this point, but I did not have much time to figure out what may have gone wrong. It was around 10pm by this point.

Research

Firstly, I had a vague idea of what to do: somehow, modify the chunk files to bring the blocks back to normal. I couldn't get any coordinates from the log files, so I had to somehow get the coordinates of the pale oak blocks from the old chunk (mca) files, and then somehow put them into the most recent version of the server.

Thankfully, most of this was documented fairly well on the minecraft wiki. I started by looking at Anvil File Format and Region File Format, as well as finding NbtExplorer around this time. Thankfully, finding the chunk files in question for the specific areas of the world was not hard in the slightest, because the F3 menu tells the exact position of the current chunk within a certain file.

NBTExplorer, the program of all time

After using NBTexplorer for some time, I realized that it would be essentially useless, as most blocks do not have any NBT data associated with them, as NBT data is used for storing, well data. This data may include orientation, content (such as on a sign), et cetera. Simple blocks just don't have anything to store.

Following this, I went down a deep, deep rabbit hole searching for other potential programs for editing individual blocks (I still didn't know how I was going to extract the coordinates of the missing blocks). Sadly, most world editing software is paid, not available on linux, or simply for outdated versions of Minecraft. I did find something that appeared to be what I was looking for (though I have no idea if this would have actually worked): MCA2Nbt, but this software is seriously outdated, and only works on FreeBSD systems—you know you're in some deep shit when you stumble upon software only for FreeBSD—I was not in the mood for editing C code, so I moved on.

xkcd 1350

Since I hit a dead end with software, I decided to ask the server for the modpack, to which I was ignored once, and then told the Minecraft equivalent of "Just take tylenol, I'm sure your chest pain is nothing." Then, I contacted the developer of the Vanilla Backport mod, and received a hopeful response:

From here, I simply decided to wait it out. As of writing this on May 30th, no patch has been released, nor have there been any pertinent commits on the Vanilla Backport github, and I was ignored upon asking for an update in the Vanilla Backport discord server.

A few days later, in Jason's anatomy class, I had an idea that would go on to form the basis for solving this problem.

Diffing

The diff command, in my opinion, is one of the most powerful linux commands. I'd already used it a couple times in the past for various tasks, but I had a real-world use for it here. Firstly, I went into minecraft and grabbed the file name of a chunk in which I was certain pale oak blocks were turned into air: r.1.5.mca. I copied this file from both the pre-update world and post-update hellscape to my primary machine, and ran the diff command. Obviously, I was not expecting straight text stating "(x,y,z) was sent to the cognitive realm!", but I only got gibberish. I didn't have too much time that day, so I decided to try again another day.

In my deep software rabbit hole, I found a python library which would go on to be crucial in this journey: anvil-parser2. I originally brushed it off as being useless, but around this time, I realized the role it could play in diffing the relevant mca files. I vibecoded a quick python script for diffing (specifically, checking if a block was filled in the old file, but turned to air in the new file), let it run until my room was hot, and it just hung indefinitely (realistically, I should've added verbose output here, but that ship has sailed). Subsequently, I decide to try again another day.

Around two days later, I realize how bad the original script was: it ran a read_block check on every single block in every single chunk. Each chunk contains 16 * 384 * 16 (98304) blocks, each mca file contains 32^2 or 1024 chunks, which means 201,326,592 (~100m multiplied by two because we're diffing two MCA files) get_block calls were made for one run of diffing. That alone would take ages, as the language in question is python, but that doesn't even include the actual comparisons being made. There was no way that script would ever have finished in any reasonable amount of time. Optimizations had to be made.

For reference, whenever I write "first (mca) file", I am referring to the old MCA file, with the pale oak blocks—whenever I write "second (mca) file", I am referring to the newer MCA file, without the pale oak blocks.

Firstly, NVME SSD read speeds are blazing fast: around 35 gigabytes per second in my case. We can simply read the raw chunk data (without any python parsing) and compare it to that of the same chunk in the second MCA file, if the two are identical, we can move along. This alone saves an incredible amount of time. Next, we can skip any blocks that are air in the first file (cutting off ~50% of comparisons), as we know that there's not any pertinent changes in the second file.

After this part, most of the optimizations were written by LLMs, but I understand them regardless. First, each section (16x16x16 cubes that form chunks) is transformed from 3d into a 1d 4096 number array, where we can decode if a certain block is filled in the first file, and then decode the same block in the second file, checking if it's air as we do so. Decoding is fairly simple, as a palette is located at the start of the file, assigning certain blocks to certain codes for compression purposes. Additionally, math is kept as simple and straightforward as possible here.

An example of a line of output from the script:

1008 65 3183: pale_oak_planks -> air

The full script can be found on pastebin.

Now, with this god-tier script, I was able to diff 1024 chunks in ~20 seconds on my Ryzen 5 7600X. This gave me an output of altered blocks for the file r.1.5.mca, but I was not close to being done yet.

Shell Scripts

This was by far the most boring section to go through. Firstly, I needed to somehow find all files with differences between the old and the new version. Because Shea and I hadn't played at all after the update, these differences would presumably only be in relation to the disappearance of the pale oak blocks. This was actually very easy to do, as I was just able to run

diff regions/region_old/ regions/region_new/ > diff_files.txt

This command gave an output of

Binary files regions/region_old/r.-1.-2.mca and regions/region_new/r.-1.-2.mca differ
Binary files regions/region_old/r.1.5.mca and regions/region_new/r.1.5.mca differ
Binary files regions/region_old/r.1.6.mca and regions/region_new/r.1.6.mca differ
Binary files regions/region_old/r.2.5.mca and regions/region_new/r.2.5.mca differ
Binary files regions/region_old/r.2.6.mca and regions/region_new/r.2.6.mca differ

For some reason, r.2.* and the first file listed did not have any blocks that were transformed to air, I'm honestly not sure why they had differences, but I'm not really concerned about that.

After running the diffing python script on all of the files, I concatenated them all into a master.txt by using the following:

cat x.txt >> compiled.txt

From here, I needed to somehow turn these outputs into setblock commands, and then somehow parse those commands into the server. I wrote a simple shell script to turn these outputs into setblock commands, using compiled.txt. The shell script is reproduced below:

#!/bin/bash

FILE=$1

declare -a blocks
declare -a coords

addblocks() {
    while IFS= read -r line; do
        blocks+=("$line")
    done < <(
        sed 's| *->.*||' "$FILE" \
        | tr -cd '[:alpha:]\n[=_=]' \
        | sed 's|^|minecraft:|'
    )
}

addcoords() {
    while IFS= read -r line; do
    coords+=("$line")
    done < <(
        tr -cd '0-9[:space:]\n' < "$FILE"
    )
}


formcommands() {
    for i in "${!coords[@]}"; do
    echo "setblock ${coords[i]} ${blocks[i]}"
    done > commands.txt
}

addblocks
addcoords
formcommands

An example line from commands.txt:

setblock 412 67 184 minecraft:pale_oak_planks replace

Slight tangent, for some reason, 3 spaces are always there before minecraft, and I'm not smart enough to figure out why. I just made a simple sed command to fix this issue and moved on:

sed 's/.\{3\}\(minecraft\)/\1/g' commands.txt > commands_fixed.txt

Now, I needed some way to pipe these commands into the Minecraft server. Thankfully, with a quick search, I was able to find the RCON protocol, which essentially acts as a sort of API for sending commands to a Minecraft server. After a few quick changes to server.properties, I enabled rcon. I downloaded ARRCON as a client to use for this project, as it was the first result online.

Then, I wrote a quick shell script to use commands.txt as an input for rcon:

#!/usr/bin/env bash
set -euo pipefail

HOST=localhost
PORT=25575
PASS="do you think i am stupid?"
COMMAND_FILE=$1

while IFS= read -r command || [[ -n "$command" ]]; do
  if [[ -z "$command" || "$command" == \#* ]]; then
    continue
  fi
  echo "Sending: $command"
  arrcon -H "$HOST" -P "$PORT" -p "$PASS" "$command"
done < "$COMMAND_FILE"

Finally, it was time to try this. I thought that there was absolutely no way this could fail, that the script was so simple it could not break, right?

That position is not loaded.

Okay, great, awesome. Thankfully, there is a forceload command which should be able to fix this issue pretty easily. I wrote another quick shell script to load chunks, it didn't work, I vibecoded a shell script to load chunks, it didn't work, I almost went insane, until I realized:

412 67 184

Those coordinates are not even remotely close to where the pale oak blocks are actually located (around 1000, 67, 3000). This was caused by an error near the end of diffs.py, where Claude didn't use the correct formulas (reproduced below), for some reason:

region_x * 512 + cx * 16 + x = world_x region_z * 512 + cz * 16 + x = world_z

In which, cx and cz are the chunk x and z coordinates. Y coordinate does not matter, as chunks only have the coordinates (x, z), and the mca files store the region x and z in their file names.

Sadly, I don't have the original formulas used, so you can't laugh at them.

Now, I had to run the diffing process again for r.1.5.mca and r.1.6.mca in order to fix these coordinates (I probably could've written a third shell script to modify commands.txt accordingly, but someone else can do that). I ran the command formation shell script again, and then finally, I was ready to do another attempt at running the setblock commands.

After a handful of small tweaks to the send_commands.sh file (which I sadly don't remember, nor do I want to remember), it finally worked! (With one large issue)

Here, you can see both my happiness and disappointment unfold simultaneously:

Now, it is apparent that all the stairs were rotated the wrong way, but I honestly didn't care at this point. I was just happy that all of our builds were back. There's still a few other weird issues: slabs are only in the top position, some slabs are missing, there's a cluster of like 10 blocks that I can't get back no matter what, but I could not force myself to go back down the NBT rabbit hole to fix those issues. A simple wrench and a little bit of teamwork in the game will allow us to fix those issues easily.

Conclusion

In conclusion, the developers of BMC5 are... interesting, to say the least. I don't want their heads on sticks for this issue, since I know how hard developing a modpack can be, but this is gross negligence at best. I am shocked that Shea and I were the only two people to ever encounter this error in the modpack. If anyone can explain how this occurred, please let me know, and I will be incredibly grateful.

Additionally, let this be a call to all modpack developers to simply add an update mechanism within the server pack for the pack. It's honestly shocking that out of all the modpacks I've played, I have never found even ONE with this feature.

In the end, I am not a developer, but I do my best to think like one when I have to.