Akila DJ: ඉංග්‍රිසි ක්‍රමවිධි සිංහල හමුවේ අවුල් ඇති කිරීම!

2010-01-14

ඉංග්‍රිසි ක්‍රමවිධි සිංහල හමුවේ අවුල් ඇති කිරීම!

type කරද්දි කොම්බුව මුලින් ටයිප් කළාම හ‍රි විදිහට input method තියනව. අවුලක් නෑ!

දැනට කොම්බුව තියෙන්නෙ යුනිකේත වල අකුරට පස්සෙන් නිසා list කරද්දි අවුලක් නෑ.

කොම්බුව list එකේ මුලින් දානව නම් පිළිවෙල බලද්දි ගැටළු එනවලු!

අකාරාදී පිළිවෙලට sort ක‍රද්දි ගතකුරු (ව්‍යඤ්ජන) මුලින් අරගෙන, ඒ ඒ ගතකුර යටතේ අනු ප්‍රවර්ගයයක් ලෙස පණකුරට (ස්ව‍රයට) අදාළ ඇල පිලි, පා පිලි ලයිස්තුගත කරන එක අමාරුද දන්නෙ නෑ?

කොම්බුව list එකේ මුලින් දානව නම් එහෙම කරන්න වෙයි වගේ නෙ!

මුලින් බලනව 'ක','ග','ට'... වගේ ඒව. 'ා','ැ','ො','ෙ'... ඒව අතහ‍රිනව.

පස්සෙ 'ා' 'ැ' 'ො' 'ෙ' බලන්න වෙනව. අදාළ අකුරට පස්සෙන් තියන 'ා' 'ැ' බලද්දි ඒකට ඉස්ස‍රහින් තියන 'ෙ' බලන්න වෙනව.

වැඩේ සංකීර්ණ වෙනව...

ඇත්තටම සිංහල භාෂාවට අනුව "කා" කියන්නෙ එක අකුරක්ද, අකුරු දෙකක්ද?

යුනිකේත වලදි "කා" අකුරු දෙකක් වගේනෙ!

"කො" කියන එක එහෙම Unicode වල අකුරු දෙකක් වගේ, DL,FM non-unicode ඒවගෙදි අකුරු තුනක් වගේ!

ක්+ආ = කා

මේක යුනිකේත වලට නම් අකුරු දෙකක්, ඒත් අපේ භාෂාවෙ හැටියට අකුරු දෙක එකතු වෙලා හැදෙන්නෙ තනි අකුරක්ද? නැත්නම් අකුරු දෙකක් "කා" යන්නෙ තියනවද?

යුනිකේත වල පිල්ලම් වල අකාරාදී අවුල එන්නෙ ඕක අකුරු දෙකක් ලෙස තියෙද්දි නෙ.

අකුරු එක් වෙලා තනි අකුරු හැදෙනවනම් list කරද්දි කොම්බු අවුල් නැති වෙයි, හෝඩිය වගේ එක ලොකු වෙලා වෙනත් අපහසුතා ඇති වෙන්න බැරි නෑ...

"කා" යන්නෙදි වගේ නෙමෙයි "කෙ" යන්නෙදි....

ක්+එ එකෙන් හැදුනට "කෙ" එකේ කොම්බුව ලියන්නෙ කලින් නෙ!

සංකීර්ණ භාෂා!

මේකට විසදුම airtel එකට මාරුවෙන එක නෙමෙයි නෙ :p

මේ වගේ යම් යම් දේවල් හදාගන්න Tamil, Telugu, Hindi, Thai, Cantonese, Japanese ..... ඒ වගේ වෙනත් සංකීර්ණ අකුරු තියන භාෂා වලින් යමක් කමක් ඉගෙන ගන්න බැරිද?

Computers හා Keyboard සදහා අඩිතාලම වැටිල තියන්නෙ English නිසා ඒක අපිට ඕනෙ විදිහට හදාගද්දි අවුල් එනව නෙ :)

Roman, Cyrillic වගේ යුරෝපීය අකුරු වල මේ වගේ ගැටළු නැති ත‍රම් නෙ!

34 comments:

SamJanuary 15, 2010 at 1:10 AM
සිංහල යුනිකොඩ් එක අයීති උදවිය නාකියායැයී පිස්සායැයී කතෝලිකයායැයී සහ තවත් බොහෝ ආදරනීය නම් වලින් හදුන්වන ඩොනල්ඩ්ගොයීයාගේ විසදුම තමයී මේකට ගැලපෙන විසදුම. ඔය පුරස්නෙට අමතරව එක අකුරක් (a letter) ලියන්න අකුරු (characters) කීපයක් බාවිතා කරන්න වුනහම දත්ත ගබඩාවේ පාවිච්චි කොරන්න වෙන ඉඩ වැඩියී. ගණනය කොරන්න යන කාලය වැඩියි. ඒ හැම වැඩි වෙන කාලයක්ම, ඉඩක්ම අපේ මුදල්. තනි තනිව ගත්තහම බොහොම සුලු ඉඩක් කාලයක් උනත්, දීර්ග කාලීනව බොහොම ලොකු වියදමක්.

ඒ ඔක්කොටම මදීවාට මේ මලදානෙත් එක්ක අකුරු ප්‍රමානය (letter) හරියට ගනන් කොරන්න බැහැ. ඔන්න උදාහරනෙට ආගමන විගමන පත්‍රිකාවේ (embark form) කොටු විස්සක් ගහලා තියනවා නම ලියපන්යැයී කියලා. ඉතින් ඕකෙ අකුරු (letter) විස්සෙ නම ලියලා ඉවර වෙනකොට අකුරු හතලිහක් පනහක් (characters) වෙලා. මමත් දවසක් වෙබ් අඩවියක් සිංහල යුනිකෝඩ් වලින් නැවතත් ලියන්නයැයී කියලා පටන් අරගෙන, දත්ත හරි වැරදි බලන්න (data validation) ගියහම හිරවුනේ නිකන් වල් ඌරා උගුලකට අහූඋනා වාගේ. පස්සේ ඉතින් යූනිකෝඩ් මලදානේ අමතක කොරලා, සයීට් එක ඉංගිරිසියෙන්ම තියෙන්න ඇරලා, මුත්තාගේ ලනු ඇදේ දපනෙ දාලා නිදා ගත්තා.
ReplyDelete
Replies
AnonymousJanuary 15, 2010 at 8:04 AM
මේ character සහ glyph පිලිබඳ ප්‍රශ්නය සිංහලයට පමණක් ඇති ප්‍රශ්නයක් නොවේ. ප්‍රශ්නය ඇති වන්නේ බොහෝ දෙනා ඇස්කි යුගයෙන් තවමත් සිතීමෙනි.

http://www.unicode.org/standard/where/

http://www.unicode.org/reports/tr17/#CharactersVsGlyphs

යුනිකෝඩ් අකුරු සෝට් කිරීමට හෝඩියෙන්ම එන ඉලක්කම් පිලිවෙලක් ඇවැසි නැත. එය අපට කැමැති ආකාරයකට (හෝඩිය අර්ථ දැක්වෙන අනුව) යොදා ගත හැක.

http://unicode.org/reports/tr10/

ඩොනල්ඩ් ගොය්යා කිච වූයේ මේ කරුණු නොදැන හෝ දැන දැනම "සම්මත යුනිකෝඩයේ "දු" යන්නක් නැත" වැනි ගොන් කතා කියා මිනිසුන් රවටා තමාගේ පේටන්ට් එකෙන් සල්ලි ගැරීමට ගිය බැවිනි. යුනිකෝඩයෙන් අරක ලියන්නට බැහැ, මේක ලියන්නට බැහැ කියමින් බොරු කියමින්, ලිවිය හැකි බව පෙන්වූ විට නොදැක්කා සේ සිට ඒ ෆෝරමයෙන් අතුරුදහන් වී නැවතත් එයම වෙනත් ෆෝරමයකට ගිරවෙක් මෙන් කීමෙනි. නැතුව කතෝලික වීම නිසා නොවේ.

සිංහල බෞද්ධයන් කොහොමත් පෙන්නන්නට බැරි සමහර බ්ලොග් කාරයන්ට නම් ඩොනල්ඩ් ගොය්යා ට සපෝට් කරන්නට කතෝලික කම ම ඇත.
ReplyDelete
Replies
KulendraJanuary 15, 2010 at 8:25 AM
අහපු ප්‍රශ්ණෙට මුලින් උත්තර දීලා ඉන්නම්; 'කෙ' කියන්නේ සිංහල හෝඩියේ තනි අකුරක්. සිංහල භාෂාවේ තියන අකුරු ගණන 60 නෙමයි, ස්වර x ව්‍යංජණ වලින් ලැබෙන පිලිතුර.

වෙච්ච දේ ට නම් විසඳුම් තියෙනවා. මම හිතන්නේ කාලෙකට කලින් ඕක සිංහල බ්ලොග් කරුවන්ගේ සංසදයේ හරි වෙන තැනක හරි කතාවුනා. පුලුවන් වුනොත් ලින්ක් එක හොයලා දෙන්නම්.
ReplyDelete
Replies
Donald GaminitillakeJanuary 15, 2010 at 9:37 AM
This is the first time that some one talk the truth
quote
"සම්මත යුනිකෝඩයේ "දු" යන්නක් නැත"
Unquote

"කා" is one character. "දු" has no utf value in unicode data base this is hidden in the uniscribe unicode script processor

This is the problem that all indic languages facing. We got to publish values for all sinhala characters.

Only I have done it and I have the copyrights and a patent for this thinking.

This is the only solution I offered it to SLSI as objections but was over ruled by VKS group
Therefore I got my rights legally

Donald Gaminitillake
Let us change the standard
ReplyDelete
Replies
AnonymousJanuary 15, 2010 at 12:16 PM
ඩොනල්ඩ් මහත්තයෝ සම්මත යුනිකෝඩයේ නැති දුයන්නක් සම්මත සිංහල හෝඩියේත් නැත. එහෙනම් ඉස්සෙල්ලාම හෝඩියත් නිවැරදි කරන්න.

සෑම් ගේ ඩොනල්ඩ් ගොය්යා ඔන්න නැවතත් යුනිකෝඩ් ගැන තමන්ගේ නොදැනීම ප්‍රදර්ෂනය කරයි. ඕනෑ අයෙකුට මේවාට උත්තර යුනිකෝඩ් කොන්සෝටියමේ ඉහත ඇති ලින්ක් වලින් බලා ගත හැක.
ReplyDelete
Replies
Donald GaminitillakeJanuary 15, 2010 at 1:04 PM
quote
සම්මත සිංහල හෝඩියේත් නැත.
unquote

If "දු" is not a sinhala character how do you write :train: in Sinhala?

Quote
එහෙනම් ඉස්සෙල්ලාම හෝඩියත් නිවැරදි කරන්න
unquote

Sinhala hodiya is correctly published in
ISBN 955-98975-0-0

Sinhala hodiya contains all sinhala characters and letters

Donald Gaminitillake
Let us change the standard
ReplyDelete
Replies
Donald GaminitillakeJanuary 15, 2010 at 1:07 PM
Quote
ඕනෑ අයෙකුට මේවාට උත්තර යුනිකෝඩ් කොන්සෝටියමේ ඉහත ඇති ලින්ක් වලින් බලා ගත හැක.
unquote

Publish the unicode data base value for කා and "දු"
proper registration UTF value

Donald Gaminitillake
Let us change the standard
ReplyDelete
Replies
Donald GaminitillakeJanuary 15, 2010 at 1:15 PM
Yes more from unicode.

http://www.unicode.org/reports/tr2.html

A person called Andy Daniels wrote the Original Sinhala proposal not by any sinhala person or a sri lankan. In his proposal he write as follows.
Quote
There is a standard extant for Sinhala described in A Standard Code for
Information Interchange in Sinhalese by V.K. Samaranayake and S.T. Nandasara
(ISO-IEC JTC1/SCL/WG2 N 673, Oct. 1990). The coding proposed in it was found
to be an inadequate basis for a modern, computer-based interchange code,
though it is adequate to handle the capabilities of a Sinhala typewriter for
Representing contemporary colloquial Sinhala.
Unquote
ReplyDelete
Replies
දේශද්‍රොහියාJanuary 15, 2010 at 11:57 PM
අපිට නම් ඔය දු,කා ගැන වද වෙන්න වෙලා නෑ.

හැබැයි මේ ඇනොනිමස් කමෙන්ට් දාන ගල් අපතයා කවුද කියලා නම් බැලුවාම තේරෙනවා. අපි නම් ඇනොනිමස් දාන ගමන් ඒකට බනින්නේ නෑ. අපි කොහොමත් ඇනොනිමස්. ඒත් මේ වණ්ඩු ආප්පයා ගේ කටේ නම් වෙන එකක් හැම තිස්සෙම.

මේ ළඟදි මම බල්ලෙකුට ඇටිකිච්චන් මස් කට්ටක් දැම්මා, මම කන ගමන්, ආ ඉඳා කාපන් කියලා. ඌ ඒක රස කර කර කන්න හදනවා, ඇති දෙයක් නැති වුනාට. අනේ ඉතින් මට දුක හිතිලා ලොකු මස් කට්ටක් දෙන්න ලඟට ගියා. මෙන්න බොලේ ඌ මට ගොරවනවා. නොදකින් බලු කැහැට්ටා, දිවි හිමියෙන් අර හම්බ වෙච්ච සොච්චම බේර ගන්න දඟලනවා. මට අඬන්නද හිනාවන්නද කියලා හිතා ගන්න බැරුව හිටියේ.

දැන් මේ පඟර නැට්ටො දැක්කාම තේරෙන්නේ සමහරුත් අර බලු කුක්කා වගේ නෙ කියලා. මුලින්ම දීපු රෙද්දක් අල්ලගෙන දැන් ඒක වෙනස් කරන්න දෙන්නෙත් නෑ, විනාස කරනවා කිය කිය. අර විකාරෙම බදාගෙන ඉන්නවා. වෙනස් කරන්න එන එවුන්ට ගොරවලා බුරලා පන්න ගන්නමයි හදන්නේ.
ReplyDelete
Replies
AnonymousJanuary 16, 2010 at 1:43 AM
ඩොනල්ඩ්,

සිංහල හෝඩියේ දුයන්නක් ඇති බවට ඔබේ තර්කය 'දුම්රිය' යන්න සිංහලයෙන් ලිවිය හැකි බව නම්, එම තර්කයට අනුවම දැන් 'දු' යන්නක් යුනිකෝඩ් වලද ඇති බව පිලි ගත යුතුය. ලියන්නේ පුලුවන් නිසා නොවැ.

ඔබට අවශ්‍ය නම්, අදාල character එකේ යුනිකෝඩ් අගය 0DAF 0DD4 යන්නයි. මෙය දයන්නේ සහ ඇලපිල්ලේ එකතුව මිස වෙනම කෝඩ් පොයින්ට් එකක් නොවේ යයි මර හඬ දෙන්නට එපා. ඩොනල්ඩ් ට කෙසේ වෙතත් මොලයක් ඇත්තකුට කොන්සෝටියමේ ම ඉහත ලිපි කියවීමෙන් මෙහි කිසිම ගැටලුවක් නැති බව පෙනෙනු ඇත.

දේශද්‍රොහියා - පුද්ගලික වල් බූත වලට පිලිතුරු දීමට මට වෙලාවක් නැත.
ReplyDelete
Replies
දේශද්‍රොහියාJanuary 16, 2010 at 9:22 AM
ඔව්....

අද ගමේ යන්නත් ඇති නේ... පොල් බූරුවා...

ඔය කොන් ද මොකද්ද සෝටියම් එකේ කියන්නේ මොකද්ද කියලා කියපන්කො හරියටම උඹට ඔච්චර රත්තරන් වටින වෙනස් කරන්න හොඳ නැති මල් තියලා වඳින අහවල් එක?
ReplyDelete
Replies
SamJanuary 16, 2010 at 9:32 PM
Anonymous
මම මේ ගැන තරමක දිග උත්තරයක් මෙතන ලීවා. කියන්න දෙයක් තියනවාද?
ReplyDelete
Replies
Donald GaminitillakeJanuary 16, 2010 at 10:25 PM
Quote
'දු' යන්නක් යුනිකෝඩ් වලද ඇති බව පිලි ගත යුතුය
Unquote

'දු' is one character you are giving two characters which are
01 CODE POINT VALUE: : : : : 0DAF
02 NAME (UNICODE NAME) : : : SINHALA LETTER ALPAPRAANA DAYANNA

01 CODE POINT VALUE: : : : : 0DD4
02 NAME (UNICODE NAME) : : : SINHALA VOWEL SIGN KETTI PAA-PILLA

This is not the single character "DU"

Therefore there is no utf value for "DU" or you cannot use "DU" in computer programming

Donald Gamnitillake
Lets us change the standard
ReplyDelete
Replies
Donald GaminitillakeJanuary 16, 2010 at 10:33 PM
Quote
කෝඩ් පොයින්ට් එකක් නොවේ යයි මර හඬ දෙන්නට එපා.
unquote

If any character is not a code point in unicode database you cannot use it across any platform or in any computer program.
You admit that "DU" has no code point in unicode data base.

see quote from unicode report by andy daniel

quote
(ISO-IEC JTC1/SCL/WG2 N 673, Oct. 1990). The coding proposed in it was found to be an inadequate basis for a modern, computer-based interchange code,though it is adequate to handle the capabilities of a Sinhala typewriter
unquote

So you guys use a typewriter not a computer
Donald Gaminitillake
let us change the standard
ReplyDelete
Replies
මධුරJanuary 16, 2010 at 11:15 PM
සිංහල unicode හී ඇති වැරැද්දක් නොමැත standard එකක් නිසා uncide rules වලට අනුකූලව කටයුතු කලයුතුය, ඔය කොම්බුවේ ප්‍රශ්නය එන්නේ එතනින්. ඒත් එසේ කිරීම නිසා වන්නෙ පහසුවක්, sorting වල දී ව්‍යඤ්ජනාක්‍ෂරය මුලින් යොදවා පසුව එන කොම්බු ආදිය නිසි ආකාරයට පිලියෙල වෙනවා ඒ නිසා එහි වැරැද්දක් නෑ, සිංහල වලට තියෙන සියලුම අකුරු වලට සම්මතයක් ඕනා නම් ඒ අයට තම තමන්ගේ super computers වල දුවන අම්බානක් දිග 0000 සිට FFFF වගේ පරාසයක ඇති වෙන වෙනම සකස්වුන අකුරු සමඟ යමක් කරන්න පුළුවන් වෙයි ඒත් එය unicode වැන්නක් තුලට ඔබ්බන්න යෑම මෝඩ වැඩක් UTF-8 encoding එක තමා වැඩි පුරම භාවිතා වෙන්නේ ඒකේ length එක 2 byte. බයිට 2ක් ඇතුලත ලෝකයේ භාවිතාවන හැම භාශාවකම අකුරු වලට ඉඩ දෙන්න නම් එක compress කරන්න පුළුවන් intelligent හෝඩි තියෙන භාෂා ඒවාට ඉඩ දිය යුතුයි එහෙම නැතුව මේ unicode බූදලය කියලා හිතාගෙන ආ මේක දාන්න නෑ නෑ මේ මදි මේ ඔක්කොම දාන්න නැත්නම් අපිට එපා වගේ කෝලම් නටන්න යෑම නුවණට හුරු දෙයක් නෑ ඔච්චර දේවල් ඕනා නම් UTF32 හරියෙන් ඉඩ ආයි වෙන් කරගන්න එහෙම වෙන් කරගෙන ගිගාබයිට් වලින් මනින text file හදන්න පුළුවන් ඔය සියලු අක්‍ෂර මේක අස්සෙ තියෙන්න ඕනාය කියන අයට ඒත් ඒවා සියල්ලන්ටම වගේ වෙන් කෙරුන UTF8 තුල බලා පොරොත්තු වෙන එක මෝඩකමක් මදිවට එය ප්‍රදර්ශනය කිරීමක් කියලයි මට හිතෙන්නෙ
ReplyDelete
Replies
Donald GaminitillakeJanuary 17, 2010 at 1:39 PM
So why not leave unicode data base and create new standard for all indic languages

Donald Gaminitillake
Let us change the standard
ReplyDelete
Replies
Donald GaminitillakeJanuary 17, 2010 at 1:45 PM
Quote
සිංහල unicode හී ඇති වැරැද්දක් නොමැත standard එකක් නිසා uncide rules වලට අනුකූලව කටයුතු කලයුතුය,
Unquote

I challenge you to write a simple program in sinhala -- include "KI" "KA" "KU" "ri" "gu" DU" "ksha" in the programe text

Donald Gaminitillake

Let us change the standard
ReplyDelete
Replies
Donald GaminitillakeJanuary 17, 2010 at 1:49 PM
Also note that two byte can take up to 64,000 characters or akuru per onle scale

For all indic languages needs only 50 such two byte scales.

Lets exit unicode consortium and make our own standard.

Donald Gaminitillake
Let us change the standard
ReplyDelete
Replies
maduraJanuary 17, 2010 at 10:38 PM
huh a challenge? what do you mean include those? i can include them in like කි ක කු රි ගු ඩු ක්‍ෂ if you are not seeing these properly then that's not my problem :P
ReplyDelete
Replies
AnonymousJanuary 18, 2010 at 8:10 AM
>> This is not the single character "DU"
>> Therefore there is no utf value for "DU"

have you ever read the unicode consortium's articles i mentioned above? then it must be clear to you that there is no NEED in unicode for such 'single characters'. Therefore, 'du' is a character in unicode. If you don't agree, them complain to unicode consortium about their design of unicode. Any child who understands English will understand this simple fact.

There are few bugs such as sam mentioned, but those are due to old implementations of functions that did not take unicode properly into consideration. Those will be corrected with time, like the way Win95 did not support Sinhala unicode, Win XP did with add-ons and Win Vista and linux support it natively.

http://docs.sun.com/source/816-6409-10/ident.htm#1009568

>> Therefore there is no utf value for "DU" or you cannot use "DU" in computer programming
>> So you guys use a typewriter not a computer

It's the type writers that can not figure out a letter and a pilla must be combined to make a single character. Computers are smarter than that and unicode provides all the means to define the joining rules.

Now, it is clear to anyone who reads English that the problems as claimed by Donald are the way unicode is designed. Now Donald knows that more and more people understand this, that's why he is now saying "lets forget unicode" so that he can make money out of his stupid worthless patent.
ReplyDelete
Replies
Donald GaminitillakeJanuary 18, 2010 at 6:16 PM
Quote
that's why he is now saying "lets forget unicode" so that he can make money out of his stupid worthless patent.

Unquote

If my registration is stupitd why worry? With a studpid system how can I make money?

You guys are having a wrong system that cannot be implemented across all platformes or across any application.

My system can be applied on any platform or on any application. Without codepoints sinhala "DU" or KSHA of rajapaksha has no value as text.

Donald Gaminitillake

Let us change the standard
ReplyDelete
Replies
AnonymousJanuary 19, 2010 at 12:41 AM
>>> You guys are having a wrong system that cannot be implemented across all platformes or across any application.

you have been saying this since a time where no OS supported sinhala unicode natively.

Today, both Windows and Linux supports sinhala unicode natively. Apple will follow the suit. Repeating the same thing without even reading the unicode specification correctly is your problem. That's why you have to say idiotic things for e.g. "there is no 'දු' in unicode". You don't even understand what a unicode 'character' is and you try to impose your own understanding as a 'standard' for everyone. That's why you don't have anyone other than the above filthy-mouthed 'deshadrohiya' for your support.
ReplyDelete
Replies
SamJanuary 19, 2010 at 10:15 AM
Anon, I have a challange in my web page for you. Prove you are correct.
ReplyDelete
Replies
Donald GaminitillakeJanuary 20, 2010 at 12:00 PM
Quote
'deshadrohiya'
Unquote

What is the definition of 'deshadrohiya' ?

Can you copy a simple sinhala text from microsoft word to an application running on linux?

Or copy a simple sinhala text from word to illustrator?

If you cannot perform above task you are the biggest 'deshadrohiya'

Donald Gaminitillake
Let us change the standard
ReplyDelete
Replies
AnonymousJanuary 20, 2010 at 8:07 PM
Sam,

In some old days, Donald's argument was that Sinhala unicode is not natively supported by OSes and "needs installing things". The world moved on, and this isn't true anymore for some major OSes. One of those days, java,java script, html guys would get around to implement their stuff right that they do work with unicode as expected. In the ASCII days, there was nothing called wide character support. Now there is, for almost any decent high level language. So the "proof" you have on your site just proves that HTML text boxes don't support unicode fully YET.
ReplyDelete
Replies
AnonymousJanuary 22, 2010 at 12:08 AM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
somebody-anybodyJanuary 22, 2010 at 12:10 AM
>>
Can you copy a simple sinhala text from microsoft word to an application running on linux?
<<
පුලුවන්!
ඒත් ටිකක් අමාරුයි. MS Word එක ඇතුලෙ ඉදන් English text උනත් copy කරන්න අමාරුයි නේද?
Linux ඇතුලෙදි! :)

>>
Or copy a simple sinhala text from word to illustrator?
<<
ඒ කියන්නෙ ගැටළුව illustrator එකේ කියලනෙ කියන්න හදන්නෙ? ;)
Microsoft Word එකේ ඉදන් Microsoft ලගෙ තරඟකරුවෙක් වෙන Firefox එකට එහෙම copy වෙන එක ඇති නෙ!
මගේ Photoshop CS4 එකේ සිංහල කොපි නම් වෙනව. rendering තමා අවුල්.
අනික තමයි ඇතැම් වෙලාවට Photoshop CS4 එකෙන් English අකුරු Photoshop CS4 එකටවත් කොපි කරගන්න බැරුව යනව!.. ඒ වෙලාවට copy command එක වැඩ ක‍රන්නෙ නෑ

>>
If you cannot perform above task you are the biggest 'deshadrohiya'
<<
ඇදහස හොඳයි වගේ, ඒත් පැහැදිලි මදි!
ReplyDelete
Replies
SamJanuary 22, 2010 at 10:09 AM
Anonymous.. Yes. True. Not only HTML, every single other language and Database and everything. And you have hope "they" will "fix" it for "us". That is one of the most excellent programming concepts i came across in my life!
ReplyDelete
Replies
AnonymousJanuary 22, 2010 at 11:00 PM
CSS Tags වලට custom සිංහල ටැග් යොදාගන්න පුලුවන් නේද?
කවුරු හ‍රි මේ ගැන වැඩි විස්ත‍ර දන්නවද?

අනික දැනට UTF8 සුදුසු උනාට ඉස්ස‍රහට UTF32 වගේ ඒව ගොඩක් භාවිතයට එයි. Storage Space, Processing Power, Internet Speed වගේ දේවල් වැඩි වෙනකොට එහෙම.....
ReplyDelete
Replies
AnonymousJanuary 23, 2010 at 11:35 AM
Sam,

Your excellent knowledge on software engineering seems to miss the little advice "if it ain't broken, don't fix it". Sinhala Unicode is not broken. It works exactly as defined in the spec. Browsers etc are broken because they dont' support unicode AS DEFINED IN THE UNICODE SPEC. So they will be fixed eventually, like the way Mr Donald's predictions of doom became false when OSes supported Sinhala unicode by default. Once the fixes are available, people won't have to reinvent the wheel everyday for every application. For instance when MS fixes their rendering library ONCE, that will apply to every single program written from then on in an MS language.

If you have a problem with that, you need to blame the unicode consortium for "wrong spec" or "spec that are not YET supported by software". (now I hope the "deshadrohiya" anonymous won't come back to ask what the consortium is. heh heh heh)
ReplyDelete
Replies
maduraJanuary 24, 2010 at 11:20 AM
Have a look in here, I've made a little library which gets the job done. And it's fast! :D
http://madurax86.heliohost.org/?p=332
ReplyDelete
Replies
maduraJanuary 24, 2010 at 11:29 AM
@ akila

UTF32 or UTF16 will never be common as UTF8, UTF8 is the only unicode encoding format that is backwards compatible with ASCII. Since most of English text is written in ASCII still UTF16 and UTF32 will not have any advantage or will be commonly used. The main reason is that UTF16 and UTF32 don't comply with ASCII like UTF8.
Here are the differences between UTF encodings

* UTF8: Variable-width encoding, backwards compatible with ASCII. ASCII characters (U+0000 to U+007F) take 1 byte, code points U+0080 to U+07FF take 2 bytes, code points U+0800 to U+FFFF take 3 bytes, code points U+10000 to U+10FFFF take 4 bytes. Good for English text, not so good for Asian text.

* UTF16: Variable-width encoding. Code points U+0000 to U+FFFF take 2 bytes, code points U+10000 to U+10FFFF take 4 bytes. Bad for English text, good for Asian text.

* UTF32: Fixed-width encoding. All code points take 4 bytes. An enormous memory hog, but fast to operate on. Rarely used.

- taken from http://stackoverflow.com/questions/496321/utf8-utf16-and-utf32

So UTF8 is and will be the most common encoding as far as I can think of. :)
ReplyDelete
Replies
somebody-anybodyJanuary 24, 2010 at 11:23 PM
>>
"UTF32 or UTF16 will never be common as UTF8"
<<

Future is not Predictable!...
ReplyDelete
Replies
somebody-anybodyJanuary 27, 2010 at 3:01 PM
මෙන්න මේවගේ දේකට අදාළ link එකක්

http://www.kotuwegedara.com/post/2009/12/18/e0b6b8e0b6a7-e0b685e0b784e0b6b8-e0b780e0b6bae0b799-e0b6b4e0b791e0b6b1-e0b6b4e0b6b8e0b6ab.aspx

http://www.kotuwegedara.com/category/net.aspx

:)
ReplyDelete
Replies

Add comment

something,anything...