[Alpine-info] SOLVED: Alpine is converting á to ??
Mike Miller
mbmiller+l at gmail.com
Sun Mar 8 10:28:17 PDT 2009
The original problem was explained as follows:
When I pipe text into Alpine, some characters are replaced with ??. Here
is an explicit example:
$ echo Chavez | perl -pe 's/a/\341/g'
Chávez
$ echo Chavez | perl -pe 's/a/\341/g' | \alpine ""
That opens Alpine (2.0) where I see this:
Ch??ez
Eduardo noted that Alpine is expecting UTF-8, but I am sending ISO-8859-1.
I could change the code above to use UTF-8, but that wouldn't solve the
real problem which is that text comes to me, e.g., from NY Times, in
ISO-8859-1 format.
Luckily, my Ubuntu GNU/Linux system has a very handy converter called
"iconv"...
http://www.manpagez.com/man/1/iconv/
...(that I was not previously aware of or you wouldn't have heard from me
at all!). Here's how it works for the current problem:
echo Chavez | perl -pe 's/a/\341/g' | iconv --from-code=ISO-8859-1 --to-code=UTF-8 - | \alpine ""
In case you are interested, you can see the scripts I use below. The
second one, named "fixquotes" is called by the first one. I would like a
better way to do that "fixquotes" step -- it tries to fix annoying "smart
quotes" and similar kinds of weird characters for quoting and dashes. It
seems to work most of the time.
Best,
Mike
Here is the script that I call "nyt". It grabs the web page and tries to
put it into an email message in respectable shape:
#!/bin/bash
# Syntax:
#
# nyt URL
#
# will expand URL for printable page
export http_user="nyt_username"
export http_passwd="nyt_password"
# given URL
export URL="$(echo $1)"
# short URL
export URLs="$(echo $1 | gawk -F'?' '{print $1}')"
# print URL
export URLp="$(echo $1 | gawk '{print $0"&pagewanted=print"}')"
wget -q -O - --cookies=on --save-cookies=cookie.txt --user-agent="Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.3) Gecko/2008092416 Firefox/3.0.3" --http-user=$http_user --http-passwd=$http_passwd "$URL" &> /dev/null
wget -nv -O ~/NYTIMES.TEMPFILE.html --referer="$URL" --cookies=on --save-cookies=${HOME}/cookie.txt --user-agent="Mozilla/5.0 ( X11 ; U ; Linux i686 ; en-US ; rv:1.9.0.3 ) Gecko/2008092416 Firefox/3.0.3" --http-user=$http_user --http-passwd=$http_passwd "$URLp"
echo -ne "${URLs}\n\n" >| ~/NYTIMES.TEMPFILE
echo -ne "N.Y. Times\n\n" >> ~/NYTIMES.TEMPFILE
perl -pe 's/?/--/g ; s/’/\047/g ; s/̶[01];/\042/g ; s/—/--/g ; s/á/\341/g' ~/NYTIMES.TEMPFILE.html | fixquotes - >| ~/NYTIMES.TEMPFILE2.html
lynx -dump -nolist ~/NYTIMES.TEMPFILE2.html >> ~/NYTIMES.TEMPFILE
perl -pe 's/^[\t ]+//' ~/NYTIMES.TEMPFILE | fmt -w 75 | iconv --from-code=ISO-8859-1 --to-code=UTF-8 - | \alpine ""
rm ~/NYTIMES.TEMPFILE*
Here is the "fixquotes" script that is called by the "nyt" script:
#!/usr/bin/bash
cat "$1" | tr '\221\222\223\227' '\047\047\042\055' | perl -pe 's/—/--/g' | tr '\322\323\324\326\327' '\047\042\042\055\055' | tr '\202\203\210\224' '\055\055\053\042' | perl -pe 's/\^"/"/g' | perl -pe "s/\^'/'/g" | perl -pe 's/\^\$/--/g' | perl -pe 's/\^#/--/g' | perl -pe 's/?/--/g' | perl -pe 's/?/.../g' | perl -pe 's/\^-/--/g'
More information about the Alpine-info
mailing list