Memo: perl tips

메모 2011.06.16 18:26

Perl specail variables:

/usr/bin/perl -p -i -e "s/$1/$2/g" $3

디렉토리 내의 모든 py 파일의 tearDown(self): 메쏘드에 self.delete_cache() 를 호출하는 부분 추가

find . -name '*.py' -exec /usr/bin/perl -p -i -e 's/tearDown\(self\):(?!.*tearDown\(self\):)/$&\n        self.delete_cache()/s' {} \;

중간의 어떤 값을 빼서 맨 뒤로 돌림:

perl -p -i -e "s/(.*)('This is Subject')(, )(.*)(\],$)/\1\4 ,\2],/g"


입력: ['19', '1', 'MID19', 'This is Subject', 'Y', '1019', 'Y', '5'],


출력: ['19', '1', 'MID19', 'Y', '1019', 'Y', '5' ,'This is Subject'],

Deleting matching lines:

Perl does not have sed's g/re/d, which deletes entire line if the line matches pattern.
You're gonna need some trick(?):
Perl's command line option -n tells perl to just execute commands for every line (without printing the line processed), while -p option executes command AND PRINT the line processed. You could use it to delete matching lines:

perl -n -i -e 'print unless /make_more_abridged_response/'

perl command line options -p and -n (from perlrun manpage):

       -n   causes Perl to assume the following loop around your program, which makes it iterate over filename
            arguments somewhat like sed -n or awk:

                while (<>) {
                    ...             # your program goes here

            Note that the lines are not printed by default.  See -p to have lines printed.  If a file named by an
            argument cannot be opened for some reason, Perl warns you about it and moves on to the next file.

            Also note that "<>" passes command line arguments to "open" in perlfunc, which doesn't necessarily
            interpret them as file names.  See  perlop for possible security implications.

            Here is an efficient way to delete all files that haven't been modified for at least a week:

                find . -mtime +7 -print | perl -nle unlink

            This is faster than using the -exec switch of find because you don't have to start a process on every
            filename found.  It does suffer from the bug of mishandling newlines in pathnames, which you can fix if
            you follow the example under -0.

            "BEGIN" and "END" blocks may be used to capture control before or after the implicit program loop, just
            as in awk.

       -p   causes Perl to assume the following loop around your program, which makes it iterate over filename
            arguments somewhat like sed:

                while (<>) {
                    ...             # your program goes here
                } continue {
                    print or die "-p destination: $!\n";

            If a file named by an argument cannot be opened for some reason, Perl warns you about it,
            and moves on to
the next file.  Note that the lines are printed automatically. 
            An error occurring during printing is
treated as fatal.  To suppress printing use the -n switch. 
            A -p overrides a -n switch.

            "BEGIN" and "END" blocks may be used to capture control before or after the implicit loop, just as in

Adding lines to multiple files:

perl -pi -le 'print "/*\n * \$Id\$\n *\n * vim 설정\n * vim:ts=4:shiftwidth=4:et:cindent:fileencoding=utf-8:\n */\n" if $. == 1; close ARGV if eof' $(cat filelist.txt)

if 관련 바꾸기

find src -name '*.cpp' -exec perl -p -i -e "s/if\s*\(\s*(.*) \)\s*$/if \(\1\)\n/g" {} \;

Exact word matching

Just wrap pattern with '\b': 's/\bint\b/int32_t/g'

Revoming trailing white space

find . -name '*hpp' -exec /usr/bin/perl -p -i -e 's/ +$//g' {} \;

Matching as few characters as possible (non-greedy regex)

[irteam@ccassandra01.nm bin]$ echo "" | perl -pe "s/http:.*\//THIS/g"


[irteam@ccassandra01.nm bin]$ echo "" | perl -pe "s/http:.*?\//THIS/g"


Just adding '?' after * will do the magic.

Trackback 0 : Comment 0

Write a comment

잠시 근황 - 2011년 2/4분기 애니메이션 및 드라마 시청

아니메, 드라마 2011.06.04 00:59

요즘은 목요일 25시에 방영하는 あの日見た花の名前を僕達はまだ知らない。 (그날 본 꽃의 이름을 우리들은 아직 알지 못한다. 이하 "아노하나"로 표기) 그리고, 일요일 22시에 방영하는 花咲くいろは(꽃이 피는 첫걸음) 덕에 스트레스를 해소하며 산다.

오늘 방영분 "아노하나"는 아주 흥미진진했다. 특히 마지막 즈음에 와서는, "진작 그러지, 쪼옴!" 이라고... 다음주 금요일이 많이 기다려진다.

평소에 애니메를 볼 때 오프닝, 엔딩은 항상 건너뛰면서 본다.
그러나, "아노하나"는 엔딩의 저 장면이 아주 좋기 때문에 ("하악!" 하는 부분이다) 빠지지 않고 엔딩을 매주 보고 있다 :D

"꽃이 피는 첫걸음"은, '이게 뭐야, 진흙투성이 아수라장이 될 건가!' 하는 예감이... 전부터 있었지만, 더 강하게 들기 시작하지만, 한 회 한 회의 작화의 질이 극장판이라, 놓칠 수가 없다.

이 두 애니메가 번갈아가면서, 이번주는 얘가 더 재미있고, 지난주는 쟤가 더 재미있었고, 하면서 스트레스 해소를 톡톡히 해 주고 있다.

또 하나 더 있는데, 이건 화장실개그, 허리하학적 개그가 난무하는, 거의 성인물에 가까운 것이라 여기서 소개하기는 약간 거시기하지만,

일단, 소개하자면, 금요일 23시의 よんでますよ、アザゼルさん。(부르고 있어요, 아자젤씨) 되겠다.
말이 빠르고 저질스런 단어가 많이 쓰여서 좀 곤란하긴 하지만 (자막제작자도 한두편 만들다가 잠수탄 듯)
박장대소를 할 수 있기에 가끔 모아서 보고 있다.

아, 그리고!!! 진 2기도 보고 있는데,,, 드라마는 너무 길어서, 볼만한 여유가 나지 않아 묵혀 두고 있다.

Trackback 0 : Comment 1
  1. BlogIcon diemall 2011.07.28 01:00 신고 Modify/Delete Reply

    음..덕후양반이 다되셨네. 좋은글좀 올려주세요 ㅋㅋ

Write a comment

How to generate SIGBUS on x86 processors

Computing 2011.05.15 01:54


0. No SIGBUS on x86?!
1. Why?
2. How to tell x86 to warn me an unaligned memory acess?
3. So, what? - A real world application
4. Possible worry
5. When programming for Intel's CPUs, no need to care about alignment?

0. No SIGBUS on x86?!

According to wikipedia, there are two cases where a processor generates bus error:
1. non-existent address
2. unaligned memory access.

Strangely, you may have never seen such an error on x86 processors.
Compile following code and run it on a x86 machine:

You will find no problem with your program on x86 machines:

shawn.r2:~/work/aligntest$ uname -a
Linux r2 2.6.18-194.el5xen #1 SMP Tue Mar 16 22:01:26 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
shawn.r2:~/work/aligntest$ gcc a.c
shawn.r2:~/work/aligntest$ ./a.out

On the contrary, if you try it on SPARC or IA64 machines, You will definitely end up with bus error:

shawn.sx1000:~/work/align$ uname -a
HP-UX sx1000 B.11.31 U ia64 1177235479 unlimited-user license
shawn.sx1000:~/work/align$ cc a.c
shawn.sx1000:~/work/align$ ./a.out
Bus error (core dumped)

shawn.v880:~/work/align$ cc a.c
shawn.v880:~/work/align$ uname -a
SunOS v880 5.10 Generic_142900-01 sun4u sparc SUNW,Sun-Fire-880
shawn.v880:~/work/align$ ./a.out
Bus Error (core dumped)

1. Why?

Well, my speculation is that it is because x86 processors do something to take care of this kind of unaligned memory access on microinstruction level.

Intel provides a way to switch off this feature. According to Intel's ia32 system programming guide, the EFLAGS register has a flag called AC (Alignment Check) flag. It is bit 18 in the EFLAGS register. After turning the AC flag on, you will be able to encounter with bus errors.

CPL is abbr. of Current Privilege Level. Intel's processors (x86 family) have 4 CPLS: 0, 1, 2, 3. Usually CPL 0 is used by the kernel (privileged mode), and CPL 3 is for user level processes.

2. How to tell x86 to warn me an unaligned memory acess?

Now, how to turn it on? First, you need to push the content of the EFLAGS register on the stack with PUSHF assembly instruction. And raise the bit 18 of the value on top of the stack, which is current value of EFLAGS. And then pop it back into EFLAGS register (RFLAGS for x86_64) with POPF assembly instruction. Following is the assembly code (AT&T convention):

Note that if you do this on a 32bit x86, you need to use ESP register instead, as noted in the comment.
The difference between ESP and RSP is that ESP is 32-bit, and RSP is 64-bit. If your processor is x86_64, it uses RFLAGS instead of EFLAGS. You need to use your stack pointer correspondingly.

Now, insert this assembly code into the original source code :

Behold, now you have SIGBUS on an x86 processor:

shawn.r2:~/work/aligntest$ gcc a.c
shawn.r2:~/work/aligntest$ ./a.out
Bus error (core dumped)
shawn.r2:~/work/aligntest$ uname -a
Linux r2 2.6.18-194.el5xen #1 SMP Tue Mar 16 22:01:26 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

3. So, what? - A real world application

We use a bunch of workstations at my workplace. People prefer x86 Linux systems because they offer more convenience to developers with great GNU tools. And they are faster than Solaris or HP-UX or AIX systems because systems other than x86 were bought at least 5 years ago.

But, the fact that people prefer x86 Linux systems over SUN's or IBM's systems brings a problem because x86 processors do not detect unaligned memory access. For that reason, I have constantly urged my crew to test their programs on Solaris (of course not to test them on Solaris x86). But people are not pleased to use slow and crowded Solaris machines when they have fast, new x86 Linux machines.

It would be very good if I applied this one line of assembly code to our product because it will enable programmers to detect their unaligned access to the memory even on x86 machines. It would be horribly embarrassing if our product crashed, especially because of a bus error! It implies that we do not thoroughly test our product and the company's credentials would be undermined. Happy to prevent it more easily beforehand.

4. Possible worry

Some people might be worried that turning AC flag on would have side effects on another processes running on the system. But, it is nothing to worry about, because flag register(s) is(are) in the process's context. It only affects the process that turned it on. When a context switching is to take place, the operating system pushes EFLAGS register and bunch of registers on the stack of the process that's going background, and then load the context of the process selected for the next active process. The context of the process includes EFLAGS register.

This is my speculation and I have not yet tested: if you want to enable alignment checking systemwide, you are going to want to set AC flag in the CR0 control register instead of the one in the EFLAGS register. I am not sure if it is true or not for the time, but I am going to test it tomorrow. If it were true, the system I will be testing it on might go down though :-)


Unfortunately, you must be in the ring 0, or CPL 0 to access CR0. That is, you can set AC flag in the CR0 only if you were in kernel module :-p. And, my speculation turned out to be true lol


Added 12 March, 2011:

5. When programming for Intel's CPUs, no need to care about alignment?

Intel CPUs allow unaligned access of words, double words, quad words. They don't generate GP(General Protection) exception that causes a SIGBUS signal even if an access to a memory is unaligned.

Then doesn't it make sense that programmers who work on Intel's CPU do not need to care about address alignment, right? Yes. It makes sense.

But, programmers SHOULD know of the fact that an unaligned memory access requires additional memory bus cycle even on Intel's CPUs. Intel's CPU manual says:

A word or doubleword operand that crosses a 4-byte boundary or a quadword operand that crosses an 8-byte boundary is considered unaligned and requires two separate memory bus cycles for access.

- Intel 64 and IA-32 Architectures Software Developer's Manual Volume 1, Section 4.1.1

So, it is always good practice for programmers to make every effort to use aligned memory access even on Intel's machines.


포스팅 날짜 변경. 원 날짜 : 2010/10/27 12:12

Trackback 0 : Comments 4
  1. phlow 2010.10.28 10:14 신고 Modify/Delete Reply

    유용하겠군요. 전사로 전파할 가치가 있네요.

    • BlogIcon Orchistro 2010.10.28 17:16 신고 Modify/Delete

      곧장 적용하기가 힘들어 보입니다.
      어셈블리 3라인과 #ifdef compiler directives 한두어개에 불과한데, 꽤나 곤란해 하시더라고요 ㅎㅎ
      할 수 없는 거죠.
      점점 일하기 힘들어지고 있는 듯 합니다.

  2. BlogIcon Jamie Fargen 2011.05.31 21:02 신고 Modify/Delete Reply

    Even better, make your developers commit their code to a repo, and use scripts to compile and test binaries across all the supported platforms.

    • BlogIcon Orchistro 2011.06.06 21:52 신고 Modify/Delete

      Sure. Even though an automated daily build system is running, every developer has to check if their code passes all the regressiton test set at least on the platform they developed their code on. But it is not their responsibility to check if their code runs fine on ALL platforms. It is automatically checked by every day's build script by cron.

Write a comment

CDPATH and bash_completion in ubuntu 11.04: an anoying combination

Computing 2011.05.11 14:33

Since bash version 3.2, there has been a handy feature called 'tab completion'.

But with ubuntu distribution, the cool feature becomes somewhat like pain in the ass. Why? Look at following example:

shawn.ygdrasil:~/work/bash_completion$ mkdir foo
shawn.ygdrasil:~/work/bash_completion$ mkdir bar
shawn.ygdrasil:~/work/bash_completion$ ls
bar/  foo/
shawn.ygdrasil:~/work/bash_completion$ export CDPATH=~/work/bash_completion
shawn.ygdrasil:~/work/bash_completion$ cd foo
shawn.ygdrasil:~/work/bash_completion/foo$ cd bar

OK, so far, so good.

shawn.ygdrasil:~/work/bash_completion/bar$ cd f

After typing in "cd f" and then you press TAB key, suddenly "f" becomes "foo/". Even when  there is no directory in the bar directory:

shawn.ygdrasil:~/work/bash_completion/bar$ cd foo/

Someone would say that this is nicer -- obviously ubuntu guys think so. But I absolutely do not think so. Since I don't want to get caught arguing which one is better, let's cut to the chase.

How to make bash's cool tab completion feature not to search matches in CDPATH?

Bash reads (at least with ubuntu distributions) /etc/profile when it is invoked. And the /etc/profile sources /etc/profile.d/*.sh. And there is this file: /etc/profile.d/ In this file, again sources /etc/bash_completion where there exists a function which searches for directory names when TAB key is typed in. The name of the function is "_cd()".

If you don't like bash completion to search for matches in CDPATH, you simply comment out some of the _cd() function in the /etc/bash_completion like:

# This meta-cd function observes the CDPATH variable, so that cd additionally
# completes on directories under those specified in CDPATH.
    local cur IFS=$'\n' i j k
    _get_comp_words_by_ref cur

    # try to allow variable completion
    if [[ "$cur" == ?(\\)\$* ]]; then
        COMPREPLY=( $( compgen -v -P '$' -- "${cur#?(\\)$}" ) )
        return 0


    # Use standard dir completion if no CDPATH or parameter starts with /,
    # ./ or ../
    if [[ -z "${CDPATH:-}" || "$cur" == ?(.)?(.)/* ]]; then
        _filedir -d
        return 0

    local -r mark_dirs=$(_rl_enabled mark-directories && echo y)
    local -r mark_symdirs=$(_rl_enabled mark-symlinked-directories && echo y)

    # we have a CDPATH, so loop on its contents
    # for i in ${CDPATH//:/$'\n'}; do
    #     # create an array of matched subdirs
    #     k="${#COMPREPLY[@]}"
    #     for j in $( compgen -d $i/$cur ); do
    #         if [[ ( $mark_symdirs && -h $j || $mark_dirs && ! -h $j ) && ! -d ${j#$i/} ]]; then
    #             j="${j}/"
    #         fi
    #         COMPREPLY[k++]=${j#$i/}
    #     done
    # done

    _filedir -d

    if [[ ${#COMPREPLY[@]} -eq 1 ]]; then
        if [[ "$i" == "$cur" && $i != "*/" ]]; then

    return 0

Has the pain in the ass gone?

Good luck.

tags : bash, ubuntu
Trackback 0 : Comment 0

Write a comment

ssh takes too long time to connect

Computing 2011.05.02 10:48

First check if your ssh_config file has following entries in it:

    GSSAPIAuthentication no
    GSSAPIDelegateCredentials no

If GSSAPIAuthentication is set to yes, try to change it to no.

If it does not work, refer to openssh FAQ:

In most cases, it will be resolved by adding 'UseDNS no' entry to your sshd_config of the server you want to connect to. Note that the file name is sshd_config, not ssh_config.

tags : ssh
Trackback 0 : Comment 0

Write a comment

티스토리 툴바