The big problem here is trying to extract the appropriate information from the message headers. First, your grep may or may not have been enough to get all the relevant information. RFC822 headers can span multiple lines. A sed can get all the relevant lines. But then you have to parse that information for the ``from'' portion of the header, which can be in multiple formats itself.
Once you've compiled that list, it becomes much easier to deal with the rest. Just pipe it through sort and uniq.
Here's a sed script (that I'm not totally happy with) that will give you the contents of all the Received fields in an RFC822 message:
/^$/ b End
/^Received:/ {
s/^Received:[ \t]*//
n
b Cont
}
/./ d
:Cont
/^[ \t]/ {
s/^[ \t]*//
n
b Cont
}
/^Received:/ {
s/^Received:[ \t]*//
n
b Cont
}
d
:End
N
b End
(Replace the `\t's with actual tabs. I don't think that sed has eny escape characters for tabs -- only newlines. Notice that there's a space in front of the `\t's as well. They should remain there when you replace the `\t's with tabs.)
You should be able to use that to also manage to extract only the ``from'' parts of the Received headers, too, but I'll leave that as an exercise for the reader.
That should get you started, anyway. I've goofed off at work long enough as is.