Dharma Records

A record of Ānandajoti’s publication work.

Menu
  • Notices
  • Talks
  • Texts
  • Audio
  • Video
  • Photos
  • Ecology
  • Culture
  • Archives
  • About
Menu

Conversion of Ancient Buddhist Texts to Unicode

Posted on September 11, 2011September 9, 2011 by Ānandajoti

Unicode Logo

Last year I updated the Ancient Buddhist Texts website so that it was possible to read the texts through Unicode, though as the underlying encoding was still in the old standard, and the encoding was changed through javascript, it took a couple of seconds to change over on each page, giving a flash of the old encoding.

Over the past couple of weeks I have made a new Unicode font (ITM_TMS_UNI) and I have now converted the whole website to the new format, and will only be publishing in this format from now on. Later I also plan to convert the .pdf documents to the same format, but as the font is embedded in these it is less urgent.

The conversion itself was a bit nerve-racking as I had over 2,200 html documents to convert and it was all done at one go. I did of course have back-ups, but there was always the possibility that something would go wrong, and I wouldn’t be able to find it out until too late. As it was I spent a whole day checking documents for errors.

I used a modification of a script that was developed for the javascript on-the-fly conversion that was being used on the site, and I carefully tested it also. As far as I can see up till now, the whole conversion went off very well, with only a few idiosyncacies to sort out in the files themselves.

The adavantage of Unicode is that even a casual visitor can read the documents in two sections of the website (Texts & Translations and English Only), and won’t need to install any special font.

The disadvantage is that although Unicode can encode nearly every script, I use many characters that have been given no font code-slot, which makes every font that uses that character set essentially a “private” font, requiring (as with the old encoding) that the font be installed to be able to read the documents successfully.

This is very problematic, especially for Indologists. For instance, there is no font-slot for ring under r, which signifies one of the main vowel characters in Sanskrit. This also applies, of course, to all its derivatives, like ring under r with macron, ring under r with accent, etc., as well as to ring under l and its derivatives.

Therefore anybody wishing to display Sanskrit in Roman script has two choices, use a private font which will leave the characters unreadable on most machines, unless a special font is installed, or change the Unicode encoding to something non-standard, but which is close to the encoding required.

In preparing the new site I have in fact used both solutions. As ring under r is the only unallocated character that is used in the two sections named above, I have replaced it with dot under r, which has a code-slot, and is found in most extended Unicode fonts, but is the wrong character for the vowel.

But in the Original Texts and Prosody sections, there are dozens more characters that have no code-slots, so there would be no advantage with this solution as the font will have to be installed anyway, so there I have maintained strict encoding, and the only solution is to download and install the font first.

I might mention that the .epub and .mobi files which come from the English Only section were already prepared in Unicode. The Reference and Maps section has been converted to quasi-Unicode with the dot under r character in the few places where it applies.

I am now eagerly awaiting the promised update to the text editor I use to prepare the site, NoteTab Pro, which is in the process of being updated to deal with Unicode and add extra syntax highlighting. I rely heavily on this editor because of its scripting ability, and it was one of the main reasons I held back on the conversion to Unicode for so long.

Leave a Reply

Your email address will not be published. Required fields are marked *

QR code

SHORTLINK

Donations

there are many expenses involved in the making of this website. Even a small donation really helps to maintain and expand the site.

Recent Posts

  • Dhammanīti, Dhamma Wisdom Published
  • Childers’ On Sandhi in Pali Published
  • Checkboxes for Long Works on Ancient Buddhist Texts
  • Dharma Storybooks on Photo Dharma
  • The Exalted Chapters, A Simplified Translation of the Udānavarga
  • Translation of the Sanskrit Udānavarga Published
  • The Lives of Inspiring Chinese Nuns
  • Five New Albums on Photo Dharma
  • Henri van Zeyst’s Contributions to the Sri Lankan Encyclopedia of Buddhism
  • The Reliefs of Buddhavanam eBook Published

Top Ten Tags

Buddha

Dhamma

Sangha

India

China

Indonesia

Thailand

Temples

Wisdom

Ethics

Other Websites

for my other websites please see my
LINKTREE

Sponsorship

 hosting sponsored by exabytes.my 

Random Posts

  • BWV 069: True Friends
  • BWV 066: Not Listening to Divisive Speech
  • The Life of Nyanavimala Thera
  • The Discourse about Quarrels and Disputes
  • BWV 062: Ungratefulness gets its Just Desserts
  • The Beginnings of Buddhist Art by A. Foucher – I
  • King Asoka sees the Buddha
  • Second Day at the Sanchi Site
  • Appeal for Victims of Cyclone Giri
  • BWV 054: Gratefulness to Friends

Recent Posts

  • Dhammanīti, Dhamma Wisdom Published
  • Childers’ On Sandhi in Pali Published
  • Checkboxes for Long Works on Ancient Buddhist Texts
  • Dharma Storybooks on Photo Dharma
  • The Exalted Chapters, A Simplified Translation of the Udānavarga
  • Translation of the Sanskrit Udānavarga Published
  • The Lives of Inspiring Chinese Nuns
  • Five New Albums on Photo Dharma
  • Henri van Zeyst’s Contributions to the Sri Lankan Encyclopedia of Buddhism
  • The Reliefs of Buddhavanam eBook Published

Related Posts:

  • Fonts
  • Unicode Input Programme (Simple)
  • ABT Texts and Translations Series now in eBooks
  • eBooks on Ancient Buddhist Texts
  • E-books page on Ancient Buddhist Texts
  • Programmes to Input Diacritics with Two Different Methods
  • Simple Buddhist-Christian Era Conversion Forms
  • Gaṇḍavyūha Book released as PDF
  • Audio Flipbook of Arahat Sanghamitta’s Story
  • Sharing the Dhamma – Godwin Website CD
© 2026 Dharma Records | Powered by Minimalist Blog WordPress Theme