[Alpine-info] SOLVED: Alpine is converting á to ??

Mike Miller mbmiller+l at gmail.com
Sun Mar 8 10:28:17 PDT 2009


The original problem was explained as follows:

When I pipe text into Alpine, some characters are replaced with ??. Here
is an explicit example:

$ echo Chavez | perl -pe 's/a/\341/g'
Chávez

$ echo Chavez | perl -pe 's/a/\341/g' | \alpine ""

That opens Alpine (2.0) where I see this:

Ch??ez


Eduardo noted that Alpine is expecting UTF-8, but I am sending ISO-8859-1.
I could change the code above to use UTF-8, but that wouldn't solve the
real problem which is that text comes to me, e.g., from NY Times, in
ISO-8859-1 format.

Luckily, my Ubuntu GNU/Linux system has a very handy converter called
"iconv"...

http://www.manpagez.com/man/1/iconv/

...(that I was not previously aware of or you wouldn't have heard from me
at all!). Here's how it works for the current problem:

echo Chavez | perl -pe 's/a/\341/g' | iconv --from-code=ISO-8859-1 --to-code=UTF-8 - | \alpine ""

In case you are interested, you can see the scripts I use below. The
second one, named "fixquotes" is called by the first one. I would like a
better way to do that "fixquotes" step -- it tries to fix annoying "smart
quotes" and similar kinds of weird characters for quoting and dashes. It
seems to work most of the time.

Best,
Mike


Here is the script that I call "nyt". It grabs the web page and tries to
put it into an email message in respectable shape:

#!/bin/bash

# Syntax:
#
# nyt URL
#
# will expand URL for printable page

export http_user="nyt_username"
export http_passwd="nyt_password"

# given URL
export URL="$(echo $1)"

# short URL
export URLs="$(echo $1 | gawk -F'?' '{print $1}')"

# print URL
export URLp="$(echo $1 | gawk '{print $0"&pagewanted=print"}')"

wget -q -O - --cookies=on --save-cookies=cookie.txt --user-agent="Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.3) Gecko/2008092416 Firefox/3.0.3" --http-user=$http_user --http-passwd=$http_passwd "$URL" &> /dev/null

wget -nv -O ~/NYTIMES.TEMPFILE.html --referer="$URL" --cookies=on --save-cookies=${HOME}/cookie.txt --user-agent="Mozilla/5.0 ( X11 ; U ; Linux i686 ; en-US ; rv:1.9.0.3 ) Gecko/2008092416 Firefox/3.0.3" --http-user=$http_user --http-passwd=$http_passwd "$URLp"

echo -ne "${URLs}\n\n" >| ~/NYTIMES.TEMPFILE
echo -ne "N.Y. Times\n\n" >> ~/NYTIMES.TEMPFILE
perl -pe 's/?/--/g ; s/’/\047/g ; s/&#822[01];/\042/g ; s/—/--/g ; s/á/\341/g' ~/NYTIMES.TEMPFILE.html | fixquotes - >| ~/NYTIMES.TEMPFILE2.html
lynx -dump -nolist ~/NYTIMES.TEMPFILE2.html >> ~/NYTIMES.TEMPFILE
perl -pe 's/^[\t ]+//' ~/NYTIMES.TEMPFILE | fmt -w 75 | iconv --from-code=ISO-8859-1 --to-code=UTF-8 - | \alpine ""
rm ~/NYTIMES.TEMPFILE*



Here is the "fixquotes" script that is called by the "nyt" script:

#!/usr/bin/bash

cat "$1" | tr '\221\222\223\227' '\047\047\042\055' | perl -pe 's/—/--/g' | tr '\322\323\324\326\327' '\047\042\042\055\055' | tr '\202\203\210\224' '\055\055\053\042' | perl -pe 's/\^"/"/g' | perl -pe "s/\^'/'/g" | perl -pe 's/\^\$/--/g' | perl -pe 's/\^#/--/g' | perl -pe 's/?/--/g' | perl -pe 's/?/.../g' | perl -pe 's/\^-/--/g'


More information about the Alpine-info mailing list