This Is Not An Elaborately Large Quote I Am Just Writing Some TL;DR About Subtitle Formats To Explain Things As Requested By The Masses
While speaking to Eric over at Siren Visual and my bro Shadow Wolf at Supanova this weekend, the topic of various subtitle formats and how they impact visual typesetting and typography came up. Today, I’m going to be writing about the three main ways you could classify subtitle formats and how they work; namely, text-based subtitles, DVD IDX/SUB format, and the two (yes two!) BD subtitle formats, PGS/SUP and TTXML.
I’m super lazy and there is so much variation in test-based subtitling that I’ll sorta skim over this. Text-based subtitles are formats such as SRT or SSA. They are common in the ripping and fansubbing communities mostly for their ability to be turned on and off at will. Some are more complex than others. ASS (SSA V4+) for example is capable of rendering full text effects as well as vector graphics and with a container like Matroska, it can be packaged with fonts and used for full typography and visual effects as well as the subtitles themselves. SRT on the other hand is a much more basic format, it just stores lines with their times. There are many formats like this and they are used in many places and so I leave further research to the reader as this post is mostly aimed at DVD and Blu-Ray.
DVD uses a format that can be referred to as either IDX, Sub, or VobSub after the horrible renderer for Windows it used to have. DVD SUB uses 4 colours in a raw bitmap storage format. That said, the way your DVD player displays each of those colours is up to the manufacturer as the palette has 16 colours, and there are a further 16 contrast values, also with only 4 that can be used. The four colours are for the background, foreground, outline, and shadow. The background colour is generally an alpha field and so your sub-picture will overlay onto your actual video with transparency, and not covering it. Common ‘fill’ colours are white, yellow, and pale blue, while black is the most common outline. Most players have a transparent shadow, although black is fairly common too.
Blu-Ray on the other hand is a bit better. It uses 24-bit colours in its sub-picture format. This allows for a rather unique ability on a Blu-Ray, if anyone was to take advantage of it (Siren Visual I am looking at you.) Given a BD disk has so much space on it, yet most times that space isn’t utilised even CLOSE to fully, one could take advantage of this 24-bit colour (+alpha) to render full ‘soft’ typesetting onto the video. A studio could open up their compositing application of choice, do their thing, and then output a PNG sequence. Convert that to SUP, mux, and you now have full soft-sub typesetting on a BD release. I have yet to see ANYONE in the industry typeset at all, regardless of method, so this would be a real bonus on release quality.
The second format Blu-Ray has, and it seems a lot of people don’t know about this, is TTXML. TTXML shouldn’t be confused with the MP4 format’s Timed Text, which is usually referred to as TTXT. TTXML is a format mostly defined by Adobe, although barely supported by any software I have seen including Adobe’s own Flash player. It is a text-based format similar to Ogg Kate or SSA, only using XML. It is rather basic and from what I can tell (limited spec) it has no vectoring capability, but I assume SVG incorporation isn’t too difficult. It is capable of the general font selection, bolding, styling like outline and shadow, stretching, and basic text animation effects like karaoke by it’s time function, quite similar to the ASS \t flag if more basic. I have no idea if many hardware Blu-Ray players support this format, but I’m just putting it out there that it exists.
This concludes me writing walls of text about subtitles, it’s 11:48PM and my fingers are freezing. We’ll see if anyone takes interest in the Australian Blu-Ray Industry.
Grain Is Not A Defect: How Eugenics Improve Video Quality
“Way to grainy/noisy for a 1080p… I’ve seen better BRRips @ 2gb…”
As the above quote shows, a lot of people are under the impression that grain is the same as noise, and is a defect. Not just in the scene, which is already known for being incredibly dumb, but also in the AMV and fansub communities. Over the past few months, fansubbers have slowly come to terms with grain (with a few notable exceptions) and at times have gone a little too far such as adding grain for no good reason. Granted adding grain is at times necessary but trolling with it is a bit much. Before the community even thinks about how much grain is necessary though, I think the various video scenes need to get over the fact that it is NOT a bad thing.
This might come as a surprise to some but grain makes up a large amount of picture quality. Most video is encoded with DCT codecs which break things into macroblocks. Without grain or anything else detailed and small, the quantizers the various codecs use will smooth out the blocks and produce solid colour bodies, something x264 overcomes with it’s adaptive quantizer. On content such as animation, large flat colours can be how it’s meant to look, but that is certainly not the case with live footage. Another problem is that where there are large solid blocks, a quantizer might be a bit over active and smooth out a very minor gradient, often seen as grey/white blocks and lines in the sky on a clip. Something like an adaptive quantizer really helps with that, but so does the minor amount of grain often present in sources.
Noise on the other hand is usually the product of poor capturing and is an issue in the source. If a source is noisy, then naturally you denoise it. Grain however can be reduced, added, or left alone, depending on how much there is. One simple way to tell if your clip has noise or grain is by the shape, size, and distribution of it. Grain tends to be uniform and rather fine, except for in flashback scenes in anime where it is significantly larger, and it almost always covers the entire picture. Noise often just impacts a small part of the picture, is usually bigger than grain, and the noise ‘chunks’ are irregular. Paying close attention to grain will show that it appears quite regular. Digital grain is often static as well, so it doesn’t move between frames. It’s easy to spot if you look at a slow pan, the image will pan under the grain. Noise however will move in every frame. Noise also sometimes shows up as a colour aberration.
Dirt is a type of noise, and is quite rare in modern content. It is usually found in analogue content transferred to digital, mostly on older things, and sometimes in video that has already been compressed badly. I have seen it almost nowhere in live content but in anime it’s often present around hair edges. Usually edge cleaning will fix it, and if you have dirt but no other noise there is no reason to denoise your entire clip, just mask it and clean it, or use a dedicated edge cleaner filter.
To get back on track, grain is not bad. Complaining about grain makes you look dumb and blind, and getting rid of it makes content look sterile as hell. Fine grain actually looks good, is rarely even noticed, and makes a picture look significantly more natural. There is absolutely no need to remove or tamper with it. The problem is that people are stupid. I’m not going to actually talk about selective breeding but I think the title gets the point across. There are lots of good uses for grain, such as debanding without resorting to dither, which gets banded at the quantizer anyway, and overall it does look better. Now stop fucking with it.
FLV+VP6 Seemed Like A Good Idea At The Time But Now It’s More Like That Trip To Vegas
I think everyone who actually reads this is aware that FLV is one of the worst containers ever used in the history of pretty much anything. For starters it absolutely abuses timecodes, so when I take a timecode v2 dump and look and see lots of numbers around 30fps, it’s pretty much guaranteed to actually be 30fps.
Another thing FLV likes to do is give itself some random as hell resolutions, such as 583×437, when using H.264 which is generally a YV12 codec, that is, chroma subsampling is done at 4:2:0 which would force the resolution to be divisible by 2 (mod2) although VP6 allows pretty much whatever you want to do.
The video I was asked to deal with today was VP6 at that resolution at 30fps in FLV. The problem I actually had on it was that there was some luma blending. Now I figured a simple MergeChroma(last.Trim(1,0)) (cut the first frame of the chroma so that it’s no longer a frame behind the luma, for those non-avisynth using people) would do the job however it turned out to be more difficult than I thought.
See, Japan has this love of REALLY BAD framerate conversions, and one of the worst framerate conversion methods blends the luma and chroma channels to interpolate motion and whatnot. This was most definitely the case here, and eventually I decided to just freezeframe some really bad bits and leave the rest of it as is, seeing as 30fps is hardly going to make a single frame noticable to the regular human eye.
After messing around a bit more and with the help of Kuukunen, it was found that the blending was in a 4:5 pattern: Definite proof of blended interpolation framerate conversion from 24fps to 30fps. I think this is where I would like to give P.A. Works and whoever else worked on this a big warm FUCK YOU. Naturally I shouldn’t be ripping things but it’s available for free on their site and I happen to be involved in an English translation project for it.
Now, as I’ve been speaking about this on IRC to a few people, some of them commented “Just decimate the 5th frame seeing as that’s the bad one.” The issue with blended interpolation framerate conversions is that they don’t just blend 2 frames to make an extra one, they mess with ALL frames to preserve smoothness of motion, albeit introducing blending as well. That means that there is effectively nothing one can do about it, although Kuukunen suggested the following, which basically takes the 2 worst frames in any set of 5 and blends them to get back down to 24fps, with some funky 3-way blends.
s = last
s0 = s.selectevery(5,0)
s1 = s.selectevery(5,1)
s2 = s.selectevery(5,2)
s3 = s.selectevery(5,3)
s4 = s.selectevery(5,4)
interleave(s0,s1,s2,s3.overlay(s4,opacity=0.5))
assumefps(24000,1001)
I can’t say I liked the result of that, but either way Japan has proven once again that it knows nothing about quality video mastering. The industry strikes again I guess. I could maybe write a letter to the studio informing them of how they’re Doing It WrongTM and they might even send me really low res lossless clips and ask for 1080p upscaled H.264, but I don’t see that happening here.
On the side, I happen to be turning 21 today, and if anyone feels like contributing to something they should message me on Rizon ( ´∀`)
Doing It Wrong: Hardware Support, Null Frames, And Why You’re Overscanning Your Usefulness
Sure is a lot of updating from me recently, if you actually read/appreciate these pages upon pages of tl;dr and/or read them, leave a comment with your thoughts. It’s a bit depressing to see high stats on reading and 30~ comments across the entire blarg.
Everyone knows some guy with a DivX player right? Those guys that pop in CD’s with XviD or DivX DVD rips and whatnot on them and get to watch it on their TV, who haven’t yet figured out streaming matroska or watching MP4 AVC encodes? A lot of anime encoders still like to use XviD for “hardware compatibility” reasons. Some use H.264 but still in the AVI container, like “timecop”, but they are so utterly beyond help I don’t see any point in commenting here. There seem to be an awful lot of issues with these supposed hardware compatible encodes though. I’ll attempt to explain why, and hopefully people will stop using XviD/AVI or at least come to some concessions.
The three main fuckups I see in AVI encodes are overscan, variable framerates, and to a lesser degree, resolution. I’ll start with overscan. Overscan is something that happens on older (and some newer) TV’s, effectively anything that uses a cathode ray tube (ie, not LCD or Plasma, the bulky TV’s) along with flat-panels set to overscan mode for whatever silly reason. An image on a CRT can be broken into three parts: title-safe, action-safe, and overscan. Title-safe is the innermost part of a frame, where everything is certain to remain correct. Action-safe is a slightly larger area where somethings may be cut off but is usually ok, especially on the horizontal as far as subtitles go. Vertically, subtitles should always be in the title-safe region, which most often means a vertical padding margin of 5% of the vertical resolution. For example, for 720p, that would be 0.05 * 720, or 36px. About 5.5% however is the optimum reading zone for most people across almost all reading distances and font sizes. Overscan is the part that is guaranteed to be cut off.
I’m going to go and assume that anyone reading this is somewhat knowledgable about digital subtitling and is familiar with Aegisub. One of the lesser-known features of Aegisub is the overscan mask. It can be enabled under Video -> Show Overscan Mask. A blue mask will appear over your video. The darkest blue part is the overscan mask. The lighter, inner-blue part is the action-safe mask. The clear unmasked part of the picture is the title-safe section. Your subtitles should always vertically be within the title-safe section, and horizontally at least in the action-safe section. While you would imagine it preferable to be all within the title-safe region, on widescreen video the action-safe zone is actually quite wide, and very few CRT’s will actually cut it off. It also makes your subs look less crushed and bulky, and prevents multiple lines when going a tiny bit further can be achieved on one. You must always keep clear of the overscan-mask however.
Daiz wrote a decent page on some popular XviD re-encoders that appear on some torrent sites that supposedly support hardware playback yet don’t. It shows the masks in action and illustrates how each screenshot fails to fit what is needed properly. The supposed point of most of these re-encodes is to provide compatibility for hardware players, but every single one I have seen so far breaks 1 and occasionally 2 or even 3 of the things I mentioned above.
The AVI Container doesn’t support Variable Frame Rate. There is however a nify function called drop frames or null frames, which allows a frame to be dropped and the previous one to show through. This allows a way to ‘fake’ VFR, however it can make the framerate exceedingly high if you have somewhat arbitrary rates, as all different rates must have a common multiplier, most often 120fps. By using the null-frame trick, one can make a faked VFR AVI encode. There is only one problem: Hardware players BREAK HORRIFICALLY on null frames. It is completely unsupported in every player I have ever seen it on. So either you get fucked motion, or you don’t get hardware compatibility. Alternatively you can duplicate frames up to a rate, and make your file roughly 4x bigger, but I can’t say I’ve ever seen someone do this. Usually people use lower-resolution XviD encodes, so being twice the size of an HD matroska encode yet far poorer quality is somewhat silly. This especially goes for people backing up their DVD’s: if it’s VFR, don’t ever use AVI, you will ruin your motion if you intend to watch it on a hardware player.
Another limitation of hardware players is their strange need to be mod16, that is, the horizontal and vertical resolutions must be perfectly divisible by 16. That’s due to the way DCT codecs work and the fact that a macroblock is 16×16 in these codecs. The DivX specification’s max resolution is 720×480 and as such, the most ‘common’ resolution is 704×396 to keep to 16:9. The problem with 704×396 however is that it is not mod16. That would require 704×400, which is a lot more common now, but back when these low resolutions were mostly used it was far from it. It seems it’s gotten more of a following now, probably because the XviD codec is far more efficient at mod16 resolutions.
The final issue I’d like to bring up is the use of AVC in AVI. Using something as nice as the AVC codec in something as hacky and broken as the AVI container, which doesn’t support the codec fully in the first place, is just sheer stupidity. I would like to warn everyone against using Komisar and BugMaster’s shitty x264vfw. No matter what someone says, you do NOT need AVC in AVI and you do NOT need x264vfr. If you think you need these things, you need to educate yourself further.
Colour Matrices And Why Typesetters Make Encoders Look Bad
It seems that a lot of people still have issues with when and how colour matrix conversions should be used. After talking to some fellow encoders and a few typesetters I figured it was about time to write something useful here again.
There are two main colour matrices used in pretty much everything: ITU-Rec.709 and ITU-Rec.601. Contrary to popular belief, both use what we call TV levels, that is, on the luma scale, black is 16 and white is 235 as opposed to the PC levels of 0 and 255 respectively. Rec.709 (also called bt709 which I will use from now on) is used on HD material, that is anything with MORE than 576 pixels vertically, while bt601 is used on SD material, or anything up to and including 576 vertical pixels.
The trouble people seem to have is WHERE to use these. You can say that for anything at each of those resolutions, you will use what is appropriate, but what happens when someone takes a screenshot? Video is almost always encoded in the YV12 colourspace, yet screenshots are RGB. A conversion there needs to use the correct matrix, however most mediaplayers will assume bt601. Playback itself they will either read the stream info or assume based on resolution, but it isn’t so for screenshots most of the time.
As another example, what happens if you have something broadcast on an HD stream at bt709 but it’s really upscaled, and you decide to encode it at say 480p, which would you use? Effectively you can use the fancy colormatrix() plugin for avs to convert bt709 to bt601, but why not just flag the stream? It’s my own personal practice to convert SD material because I know people will screenshot it, and if it’s an SD encode, there is no point keeping the other matrix.
Furthermore, what happens when you need to process video? Most programs that use RGB outputs, say for encoding overlay files with an alpha channel, will assume bt601, I know VirtualDub and Adobe After Effects do. AutoDesk Inferno has an option to set it, although I’d assume AE does as well, but then again it isn’t something the industry likes to think about. So lets assume we have a source at bt709, and it’s HD. We then do some processing on it and output a lossless file for someone else to typeset onto, which is then encoded as RGB32, and overlayed with an alpha channel back onto the original clip. The following is a flowchart of sorts in how this normally goes, and results in incorrect colours:
Source (bt709 YV12) -> Processing (bt709 YV12) -> Lossless AVC (bt709 YV12) -> Typeset (bt601 RGBA) -> Overlayed (bt709 YV12)
You could of course encode your lossless as for example, Lagarith, at RGB24, but are you going to use a colourspace conversion in that? Yes. Are you going to do it yourself? No. The encoder will do it and assume bt601. The colours are then off. The correct way to do it would be to set your source going into the LAGS encoder as already RGB, pass that to the typesetter who has correct colours and outputs RGBA for you correctly, and everything is hunky dory. Example avs code that goes right at the end of the script:
ConvertToRGB24(matrix="Rec709")
That’s it. LAGS now has RGB data directly, and there is no problem. Because people get that wrong though, typesetters bork colours slightly (it’s really quite unnoticable to most people, and even then only on certainhues) and then the encoder looks stupid.
I think deciding which matrix to use in final encodes is really up to personal preference, as long as the stream has it correct there is no problem. All that matters is that the same colourspace is used consistently or if changed, any conversions are done correctly. Using the Colormatrix() filter is optional, if you intend to output bt601 then it’s a great way to start off in that colourspace (it goes in the header of your avs file, before anything that changes frames in any way is run) and if not, making sure to flag bt709 in x264 (–colormatrix bt709) should be done instead. At encoding time, flagging for bt601 is also good. I’d say for SD shows doing a bt601 encode is a better idea due to the aforementioned screenshot issue, but for HD stuff it’s ok to use bt601 as well really. Rec.709 is of course preferred for HD content.
I hope this has made it a little clearer about WHERE to put conversions and maybe one or two people actually learn something.
Racism? In MY Media Player? It’s More Likely Than You Think
Normally I am fairly lazy when it comes to muxing anything I encode, I have some shellscripts that do everything I need. After my CPU fiasco the other day, I have been rather edgy about what I do on my system, so a bunch of things I normally have open are being closed while I encode, with the encode being run in a screen session in case X dies on it due to RAM abuse or something. One thing I did was stop using my muxing script as well, although I don’t really see why anymore, I figure I had a reason at some stage. Yesterday, I used MKVMergeGUI (mmg) to mux an encode, and today, another. Because my script normally sets some things forcibly, it didn’t occur for me to double check it in mmg. This was when I found out the horrible truth: Several popular media player are racist against certain mimetypes.
Is there even a word for mimetype discrimination? I can’t think of one, but I’m sure Darkhold will if I cared to ask. Now, anyone who has experience with muxing fonts for soft subtitle styling will know that you have to be picky with what mimetype mkvmerge or whichever other muxer you choose to use writes in for each font. The works-for-everything mimetype is application/x-truetype-font, however by default, opentype fonts get written in as application/vnd.ms-opentype and some others get even stranger. From what my brief testing showed, MediaPlayer Classic – Home Cinema (MPC-HC) works correctly regardless. ZoomPlayer and MPlayer seem to break on that but are fine with the general application/x-font or application/x-truetype-font. VLC breaks on everything besides application/x-truetype-font, and it seems gstreamer based splitters/players will fallback to the font file extension to get it right, which is interesting as normally gstreamer Does It Wrong(tm).
I did find however that mplayer is able to link back into the system fonts if it cannot find what it needs. This made testing a bit annoying but once I had spoken to Greg, the lead libass developer, everyone became clear. So in summary, always mux your fonts as application/x-font to ensure it works correctly on all good players, even if they are discriminatory against your native mimetypes. Chances are I am the only one who would ever forget this anyway, but it’s always good to point things out to people who don’t know.
Bonus Points: Find my discriminatory comment in this post.
On the Topic of Stupid
“If I master a movie with Windows Movie Maker and I burn it to a DVD-R as a DVD, how can I get it back onto the computer?”
…was the question that I was just asked. Besides the stupidity of using WMM, what really got to me here was the way he tried to sound so knowledgeable about the topic. I would imagine that if he was able to encode the thing, he would have a copy already on his computer, but apparently not. Short post that isn’t really important but is just a minor vent about encoders being retarded.
In that regard, a Certain Man in Sydney that I will not name has gone and shown himself to be just as stupid as I suspected by going on about how great encoding on servers is. Sure it’s faster on the upload and possibly on the processing depending on the gear, but that doesn’t make it better. For one thing, CLI encoding is almost impossible to watch over and ensure quality on. YATTA isn’t exactly made for server use either. Fuck you, CommieSubs, for bothering to put out shit when all you do is kill the scene.
Wafflesub
Earlier today, ScR3WiEuS was complaining about the current splashscreen of Aegisub being rather weird, especially as he got strange looks from people every time he opened it. Personally, I think movax’s imouto is fine, if a bit dull, but it’s no big deal. Screw requested we replace it with waffles. Never one to back down from lulz, I came up with this patch. It works on the WINDOWS version of aegisub, specifically r2494.
The patch includes, but is not limited to, waffles, maple syrup, butter, strawberries, and aegisub. It does not include delicious imouto. The actual replacement image is as follows, however the two others are options I guess I could use. Can be applied to linux as well if/when I care. Also I suck at burn effects, and the warp is kinda wonky.
I Hate Typesetters
I really hate it when typesetters don’t set the resolution for the script properly. Especially when I just burnt 350MB of my bandwidth on an encode and script that isn’t that great. Not to say it was bad video wise, but I saw no reason for the HUEG filesize given that on my 1080p monitor the SD encode looks about the same.
What does shit me off is when the subs are set for 704×400, and libass being broken as fuck but fairly good at what it does scales the \pos tags in the sub track, as floats. VSFilter on the other hand (I’m assuming here, but it seems the most valid explanation) uses an integer. For the non-coders, this means that when libass scales positioning tags, it creates decimals, while VSFilter truncates that down to a solid number. That or VSFilter is capable of rendering to decimals, whichever. Point is libass can’t render to decimal and therefore shits itself.
It’s really not hard for typesetters to fix this. Static and Eclipse do it all the time, Menclave are starting to. Shinsen on the other hand seem to think there is nothing wrong with it. From a windows user point of view, yeah, its fairly ok, as 99% of windows users will be using VSFilter anyway. However SHS DO tell people to use mplayer on linux/OSX/whatever, which uses libass, thus rendering incorrectly like so.
Clearly not everyone will be pissed at this, but for the increasingly large number of mplayer users, it’s goddamn annoying and so easy for the group to fix. You can argue the user can demux it themselves and fix, but honestly is it that hard for the typesetter to do? I normally get the SD encode and accidentally grabbed the HD today, I guess I learnt my lesson.
TL;DR fuck shinsen and any other typesetters that don’t set the res properly.
Matroska Attachment Scripting
Earlier today, martino asked in #darkhold about extracting large ammounts of fonts and other attachments from mkv files. Noting that on linux there is no mkvextract gui, I went ahead and wrote a poorly formatted script. Copy to where ever and run it how you will, syntax is ./mkvattachex.sh infile.mkv a b c, where a, b, and c are optional. a is the number of video tracks, b the number of audio, and c the number of subtitle tracks. These are important so that the script can tail properly however I am pretty sure I can fix this. Defaults are all 1 so if it’s a standard file with only 1 of each then they can be ignored. The script will extract all attachments, which generally means all included fonts. I have no idea if it will kill your system and if it does, as always, blame movax.
Here be the actual script, I’ll be updating it later on with proper functionality:
http://ophion.pastebin.com/f8437ff2
