Monday, August 17, 2009

erlang: Unicode support for your filenames...

From R13B you have full unicode support for strings.

I'm involved in some kind of interface between the windows kernel and an erlang vm, and I find this "unicode" module really
helpful.

For your information, internally every file path or file name is encoded as a little endian utf16 string in the windows kernel.
Exchanging information between those two world means that you'll have to convert utf16 into ansi strings.

For example you can create an utf16 binary string using this

unicode:characters_to_binary("your string" latin1, {utf16,little}).


This means that your string is "latin1" and you want a binary utf16 little endian encoded.
Really easy !

Here's some free code that let you easily manipulate file paths and filenames...
I hope this will help someone :p


-module(filename_utils).

-export([extension/1, basename/1, dirname/1]).
-export([normalize/1, utf16toansi/1]).
-export([test/1]).


extension(Bin) ->
filename:extension( utf16toansi(Bin) ).

basename(Bin) ->
filename:basename( utf16toansi(Bin) ).

dirname(Bin) ->
filename:nativename( filename:dirname( utf16toansi(Bin) ) ).

test(Mode) ->
Word = "C:\\Program Files\\WINWORD.EXE",
File = unicode:characters_to_binary(Word, latin1, {utf16,little}),
?MODULE:Mode( File ).

utf16toansi(Bin) ->
unicode:characters_to_list(Bin, {utf16,little}).

normalize(File) when is_list(File) ->
Path = filename:dirname(File),
Base = filename:basename(File),
Ext = filename:extension(File),
{Base, Path, Ext};

normalize(Bin) when is_binary(Bin) ->
Path = dirname(Bin),
Base = basename(Bin),
Ext = extension(Bin),
{Base, Path, Ext}.

2 comments:

Unknown said...

Thanks a lot for coming back. Your blog has been really helpful to me.

Antoine said...

You're welcome,

Sticky