Artikel-Schlagworte: „subtitles“

Are you always happy with the way you present qualitative data – especially in case of audio- or videodata? If not, please proceed with reading. The main challenge of videodata lies in their dynamic nature. It inhibits us to simply print some tables, code descriptions, and rules of how to code (or not). However, subtitleing is a possibility to show how you work with your material and at the same time you (or anyone else) can watch the source material simultaneously. This requires overlay technique – and in fact it is not really subtitleing anymore.What you actually do is overlaying text (or symbols, …) along rules you define and that fit to your research project. You can place different codes or categories at different parts of the screen. That makes this method really fascinating. Overlay techniques offer many possibilities:

  • presentation of video-/ audiodata at confererences or for discussion
  • add-on to theses (Diploma, MA, PhD, etc.) or journal articles to demonstrate your work
  • teaching qualitative data analysis
  • evaluation of research
  • offering a consistent chain of reasoning, beginning with the simplest code and ending with a final interpretation

This procedure allows not only to overlay codes, metacodes, and other categories but also to add transliterations (true to audio), symbols, or whatever. It requires an export plug-in to be used from within your qualitative coding data software. Unfortunately, no software is capable of this at the moment. Therefor, a short description of how to achieve good looking results will be printed below.

My software of choice to code qualitative data is always AQUAD – mainly because of personal reasons (I am involved with its development). But you can choose whatever you like. Especially there is a R package („RQDA“) that allows to realize QDA too. Just explore it. What you need is a codefile you can read and a scripting language to convert it to a subtitle format. My choice is always .ass format (advanced substation alpha), because it allows for separate styles and many more features. I do not know any other subtitle format that offers so many opportunities. Aegisub is the editor of choice to create and to tweak on .ass files. One of the main fields of application of Aegisub is Karaoke. So it offers a great variety of features to define styles (change font, font size, color, move, rotate, etc. -> all these issues are covered by .ass format). Aegisub is a great tool and works for most OS and plattforms. Now for the procedure:

  1. Code your material or make a transcription (true to audio) or both. Use different code types (e.g. codes, meta-codes that summarize „ordinary“ codes on a more abstract level, etc.) so it just offers every possibility to fully answer your research question(s).
  2. Export your codefile together with time-codes to .ass format (save it as utf-8 because Aegisub works with utf-8)
  3. The conversion should assign a different style to each code
  4. Define styles within Aegisub – do what is necessary so that it looks nice and supports your work
  5. Adjust time-codes, styles etc. within Aegisub
  6. Play it externally (e.g. with mpc-hc, Videolan, MPlayer, XBMC, etc.). If you need another software player, install the ffdshow filter package that comes with an internal plugin to play .ass natively

Here can be offered only a very simple R script to convert AQUAD 6 .aco codefiles to .ass format. This script is still work-in-progress, but it works (at least for me). It assigns a different style for each code, but each style is identical. So you have to tweak on that with Aegisub. The script should be self-explanatory, otherwise contact me for support. Be aware that you should choose the right screen resolution and fps (frame rate) of your video.

Now you can play the video together with the subtitles, you can hardcode your subtitles into your video. Use VirtualDub for hardcoding subtitles into videostreams together with the „subtitler“ filter or the filter that comes directly with VirtualDub.

Final notes: You can use not only ordinary text, but also symbols. Text can be moved and rotated (see the Aegisub manual for details). With ASSDraw which is part of Aegisub you can create small vector graphics that can be used too (and moved, rotated, or whatever you like to do with them). If you want to produce a dvd, just read this previous posting about subtitleing and dvd authoring.

Basic procedure

The production of subtitles (without expensive software) is not that difficult. One of the best choices is to use Aegisub that works with .ass file format. Aegisub allows to use self-defined styles for different subtitles. The .ass file format can be used later for Blu-ray disc subtitle rendering with avs2bdnxml or for later dvd authoring. The following is the call for PAL (720 x 576 resolution) – you have to use Avisynth as a frameserver:

avs2bdnxml -t Undefined -l und -v 576i -f 25 -a1 -p0 -b0 -u0 -m3
           -o target-xml-file.xml source-avi-via-avisynth.avs

Using Avisynth as a frameserver, you can use almost every file format as a base to render subtitles. The .avs file for your language file looks like the following:

LoadPlugin("VSFilterMod.dll")
LoadPlugin("PATH-TO-dvd2avi-OR-dgmpgdec\dvd2avi_dgmpgdec158\DGDecode.dll")
MPEG2Source("d2v-fileproduced-by-dvd2avi-or-dgmpgdec-if-you-use-a-dvd.d2v")
MaskSubMod("subtitle-file-in-ass-file-format.ass",720,576,25,85246)

You see that you need the VSFilterMod.dll and the DGDecode.dll plugin. Then you have to prepare a .d2v file in case you work with a dvd. The MaskSubMod() call uses PAL resolution of 720 x 576, 25 fps, and the number of frames (till the end of the last subtitle). For dvd authoring, one has to reduce the color depth, because the dvd specification allows only for 3+1 colors (one color is transparency). BDSup2Sub.jar is a good choice to do that.

Replace subtitles after authoring

However, not all (even professionell) dvd authoring software suites allow to import all 3+1 colors. E.g. Adobe Encore works internally with 3+1, but it allows to import only 2+1. A clear bug! But it seems nobody from Adobe wants to fix it. However, you can replace subtitles:

  1. Demux the authored dvd with PgcDemux
  2. produce good looking subtitles with avs2bdnxml and BDSup2Sub
  3. check the .sup files with SubtitleCreator
  4. remux with Muxman or IfoEdit (preferrable Muxman as it allows to save project files)
  5. and merge the new authored dvd with the previously created menus by using VobBlanker.

Convert BDN .xml+.png to .sup

The following is the call for BDSup2Sub to produce dvd subtitles. First the call for 2 colors, then the call for 3 colors. Please read the manual  and play with the options directly in BDSup2Sub to get a feeling what it does. The colors you choose have a direct effect on the way BDSup2Sub reduces them. In case of any problems you do not need to re-render the subtitles. You can replace colors easily with a batch-script while using XnView. Then you can proceed with the command line.

java -jar PATH-TO-BDSup2Sub\BDSup2Sub.jar target-xml-file.xml
          output-idx-2-colors.idx /lang:en /atr:137 /ltr1:41
          /ltr2:42 /acrop:0
java -jar PATH-TO-BDSup2Sub\BDSup2Sub.jar target-xml-file.xml
          output-idx-3-colors.idx /lang:en /atr:137 /ltr1:41
          /ltr2:180 /acrop:0

The result of BDSup2Sub.jar are .sup files. With SubtitleCreator you can open them, apply new color code (if you like), and export them to single images (the actual subtitles) into a new folder. Be aware that colors are stored centrally in the CLUT (color lookup table) and not directly in the videostream. It is easier to replace colors later after authoring with DVDSubEdit. You can use the .xml file produced by avs2bdnxml to extract the relevant information (e.g. start/ end time code, size, position on the screen, etc.) via export to a spreadsheet (open it with IE, right mouse-click and export to Excel). There are also small commandline tools available to convert from .xml to .cvs or any .tab-based format. Please be also aware that Aegisub uses milliseconds whereas avs2bdnxml (25 in case of PAL, 29.97 for NTSC) outputs to frames (see the last part of the time-code). From these information and writing a short

dir /B folder-with-images > file-with-image-names.txt

you can write the subtitle image-names into a file, import this file also to the spreadsheet, and export everything into a file suitable for dvd authoring software suites. Then you can use a script language to create a subtitle file format of your choice. Then you can author your dvd.

Internationalization – how to handle right to left (RTL) languages

So far, so good. That works easy with left-to-right languages. With Aegisub, you can choose another font, so not only latin-based languages are displayed properly, but also languages like Gujarati, Hindi, Khmer, Japanese, Traditional Chinese, etc.

However, if you try to import Arabic, Farsi, Hebrew, or any other right-to-left (RTL) language, you will encounter many problems. In short: everything is messed up! The punctuation marks (commata, quotes, etc. ) are (almost all) at wrong places. We do not want to mention that other software (e.g. Microsoft Excel/ Office) handles RTL properly. If you use an external software solely made up to render subtitles, I have not found any who was capable of RTL import. Maybe the real expensive software can do it, but who wants to spend >2000,- € just to render subtitles (and they won’t look better than the method described above)? The reason for the low capability of these software apps is that they do not work with the unicode control characters. Punctuation marks seem to be handled LTR (left-to-right) instead of RTL. Although several authors claim they do support RTL and they also advertise with RTL capabilities, they don’t (I just test their software!).

So, how to proceed? Let’s take the case you have already time-codes from another language. Bring the subtitle file into a format, so that:

one line = one subtitle (with time codes)

The .ass format does this (Aegisub). Then open it with Windows Notepad (or a unicode text editor, but NO! wordprocessor -> MS Word is NOT a text editor). Insert a „\t“ (TAB escape sequence) between time codes and text, e.g. using .ass file format:

Dialogue: 0,0:00:12.94,0:00:17.82,person 1,person 1,0000,0000,0000,,<-INSERT
HERE TAB SPACE ->TEXT

You can do that with search&replace. Save the file as unicode (for Excel or another spreadsheet editor) and open it with a spreadsheet calculator. Then you have two columns – one with style and time code infos, one with pure text. Format the text column as you like it for RTL or insert (in case of a new translation) RTL text. Then copy both columns and copy them into a new text file (again: use Windows Notepad, a small tool which is highly underestimated). Remove the „\t“ (TAB space) with search&replace and save the file as „utf-8“ for Aegisub. Open Aegisub and look on the preview how it looks like.

Maybe everything is ok – but I doubt it. Many problems occur for the following incidents:

  • a punctuation mark is directly placed before an automatic or manual line-break. They appear at the beginning of the sub and not where they should be (at the end of the line)
  • the same occurs at the end of sentences or lines (commata, dots, etc.) – they appear not at the end of the line

This makes sense because from a LTR perspective they are placed at the end of the line (sentence, etc). But this is not true from a RTL perspective. Mixing both writing orders is difficult. Now comes the hard manual work (if you know an automatic work-around, please inform me!). Open with Windows Notepad the same subtitle file and read the following website from Microsoft about unicode control characters. Proceed as follows:

  1. enable unicode chars to be displayed („show unicode control chars“)
  2. go to the line/ place the cursor directly before e.g. the comma
  3. right mouseclick: insert unicode control char -> choose „LRE“ („Start of left-to-right embedding (LRE)“)
  4. either insert the comma (or any other char) OR go to the place dirctly after the comma (if it is already there) and insert unicode control char -> choose „PDF“ („Pop directional formatting (PDF)“)
  5. Reopen Aegisub and look what has changed
  6. Repeat steps 1-5 till everything looks ok. The rendering with avs2bdnxml won’t fail if everything is ok in Aegisub.

You can prevent it if you are disciplined and insert the LRE/ PDF control chars while creating your RTL text for the very first time.

Special conisderation – interlaced video material and subtitles

Last problem – interlaced material. In case of interlaced video material the y0/ y1 coordinates  have to be even and not odd. This is not documented very well and it took my a long time to figure it out. Someone pointed out the spumux tutorial where the problem is described. However, avs2bdnxml does not recognize this problem. It’s a bug. You will see this problem only while using hardware dvd players. I never encountered it on PC, MAC, etc. Then your subtitles look messed up again:

messed up subtitles in case of interlaced video material

 

 

 

 

 

 

To prevent this, you have to analyze the .xml metafile created by avs2bdnxml for any odd y coordinates (which different types are present?) and replace them with

y_new = y_old + 1

You can replace it within the .xml file by using search&replace if you know what you’re searching for. A good choice is therefor to convert the .xml to .csv or .tab and do a short descriptive statistical analysis (i.e. tables). Then you immediately see the different types of y coordinates. There will be only a few of these subtitle types and a manual replacement is faster than coding a script to work directly within the .xml (unless you are used to do that). Again – no need to re-render the subtitles. This won’t have any effect on the problem. It seems spumux and dvdauthor are aware of the problem, but I had problems to use the mpeg2 videostream together with spumux (in case of importing pre-rendered subtitles) and using spumux alone does not produce results that are comparable to the ones from avs2bdnxml. The quality of avs2bdnxml is superb. Maybe if someone has time you can code your own GUI based on ImageMagick which can convert text to images and is very powerfull.

Unfortunatly there is no software yet available that prevents all the problems mentioned here. However, the results while using the method described above produces nice looking subtitles for dvd authoring. They really look good even on a TV HD screen. Don’t forget to tweak on the colors with DVDSubEdit. The usage of ghostboxes (see Aegisub for how to do that) looks better than fonts with outline. Both solutions make use of antialiasing. In case of ghost boxes it is advisable to use a transparency of 12 (see DVDSubEdit) instead of 15. Then you can see the background (i.e. the movie) through the subtitles, but no too much. This looks very pleasant for the eyes and supports the readability.

That’s all. Happy subtitleling and rendering.