Have you ever spent so long on a simple problem that you forget why you started on something in the first place?

I deleted my Facebook account years ago, and recently Twitter went the same way. So I pulled my data from Instagram in preparation to kill that account as well. Then I thought I should do something with this data I’ve been liberating from these silos.

Because Instagram was the last one I’d downloaded, I thought I’d start there. Once I’d figured out which file contained the actual post data, I set about making a converter to consume the content.json file and spit it out into a format I could make something from. This is where the problem starts.

Emoji Problem

Like most people, I’d taken to adding some flair to my Insta posts in the form of Emoji, mostly smiley faces, but how do you represent them when you are asked for a flat JSON file of strings?

Take, for example, this string:

"title": "Gator bites \u00f0\u009f\u0098\u0081"

which, when I saved out the new file, became:

"title" = "Gator bites 😁"

Solution

I spent far too long playing with file encoding and checking what the individual character values were that, in the end, I realized that the reason I couldn’t find that character sequence I was expecting was that the JsonSerializer.Deserialize function had changed the encoding of the string values (badly), and what I needed to do was process the file before I deserialize the stream.

So this:

Looking at the data in debug
var jsonStr = File.ReadAllText("posts_1.json");

var jsonRoot = JsonSerializer.Deserialize<Root[]>(jsonStr);

Becomes this:

Looking at the fixed data in debug
var jsonStr = File.ReadAllText("posts_1.json");

string stringToFind = "\\u00";
jsonStr = jsonStr.Replace(stringToFind, "%");

var jsonRoot = JsonSerializer.Deserialize<Root[]>(jsonStr);

Then you can use:

var title = HttpUtility.UrlDecode(media.Title);

to make it a nice string with the emoji character in it 👍