Powered by
Movable Type 3.38 mod_perl/2

 July 2012 Archives

2012, July 30 (Mon)

Netzwelt unicode normalization

Do you know the difference between 郎 (láng) and 郎 (láng)? One is the Chinese character for fiancé and the other one is a so-called compatibility form, which is used in Korea. There is a slight difference in the appearance of the characters, but depending on your font they might look exactly the same.

Why would one care? Not usually, but if you want to copy (Chinese) text from a web page and turn it into a PDF file using pdflatex with the CJK package, and get an error about missing glyphs in the font and metafont errors — and finally track this down to these characters, which GTK “helpfully” both turns into the same 郎 when pasting…… (luckily XEmacs did not, use C-u C-x = to see information about character)

How to fix? Either get a font with all those compatibility glyphs, but as I can’t tell the difference anyway or even might not properly recognize the character, just normalize them back to their “typical” characters. For the latter, Perl can do the job quick and nice: just run s/\p{Han}*/NFKD($&)/ge to replace all the Han characters with their “compatibility decomposition”. See man Unicode::Normalize for details.


2012, July 31 (Tue)

Programmierung g++ make depend generation

If you look at automatic make file dependency rule generation for the g++ compiler, there are a lot of different solutions on the Internet. I’m not sure which one is the best, but it sure involved the -MM or -MMD switch to g++.

I found the following to work sufficiently at the end of my simple project Makefile:

.SUFFIXES: .d

.cpp.d:
	$(CXX) -MM -MT $@ -MT $(patsubst %.d,%.o,$@) $(CXXFLAGS) -o $@ $^

include $(patsubst %.cpp,%.d,$(SRC))

the include directive will make sure that the necessary name.d rule files are generated.

this requires the source files predefined in SRC, example:

SRC := $(wildcard *.cpp)