Standard information tag – By Trolling_Around

What

The Standard information tag is, quite simply an unofficial Yahoo tag which can be used to exchange information between different client software, without the need to code a different test for each client’s own ‘unique’ way of sharing information.

The format is very flexible, with very few ‘do not’s and ‘must’s, instead it has ‘should not’s and ‘should’s.  Since this is all completely voluntary I would urge you to ‘obey’ any rigidity (marked as [rigid] ), if you have a good reason to disagree please let me know.

Why

With the ever-increasing number of Yahoo chat clients comes an increase in the number of tags that client writers need to check for (or have their client rendered ‘blind’ to a particular chat-client’s name and/or features).  Every check requires either ‘code’ or ‘data’, therefore every NewClient means all other NewClient-aware clients grow in size and require extra cycles checking for it and any feature-tags.  Not to mention the Internet traffic generated (and host bandwidth used) by people download the new version.

Ok, some of that may be a bit over-board, but simply put, it will in the long run make client coders lives easier.

How

The tag itself starts with the standard Yahoo tag angle bracket <

This is followed with the Yahoo keyword font, this is followed by a single space character (ASCI char 32)

This is followed by the keyword INF

 

So far that gives us <font INF

 

Data is stored in Keyname:Value pairs.

Keyname is the name of the data (and is always preceded with a space character to separate it from the INF keyword or the preceding data) and Value is the data itself.

The INF tag ends at the next standard Yahoo tag terminator >

Keyname

Should not be more than 8 characters (not including any suffixes)

Should not contain a space character (although they MAY, since the Keyname is terminated at the next colon character).  If a colon is not found, then the Keyname is void and the rest of the INF tag should be considered invalid.

They ARE NOT case sensitive. [rigid]

 

 

 

Keyname Suffixes

The Keyname may end in a suffix, in which case the suffix is removed, it merely serves to indicate HOW the Value should be interpreted.

If a Keyname ends with the % suffix then Value should be interpreted as url-encoded, in that the % character (within the Value not the Keyname) will be followed by a ASCI representation of a hexadecimal value, this (including the % character) should be replaced with the ASCI character itself. – invalid hex digits should result in the Key being void.

If a Keyname ends with the $ suffix then Value should be interpreted as an ASCI representation of a Hexadecimal string, the actual value stored should be the ASCI characters returned by the hexadecimal pairs.

 

If neither of the above suffixes is found then Value is literal.

Value

If the first character after the colon is a double quote ( “ ) character then Value is terminated at the next double quote char (the quote characters are NOT part of the actual value), otherwise Value is terminated at the next space (ASCI 32) character OR the end of the INF tag. [rigid]

Where the closing quotes character is missing then the Keyname:Value pair should be considered void (and the remainder of the INF tag discarded)

The following all set the Key harry to the same value of ABCD

Harry:ABCD

harry:”ABCD”

HaRry$:41424344

Harry$:”41424344”

Harry%:%41%42%43%44

HarrY%:”%41%42%43%44”

Harry%:%41B%43C

The following both set welcome to hello there. (two words separated with a space)

Welcome:”hello there”

Welcome%:hello%20there

Defining Keynames

Some Keynames have been defined, however since the INF tag is not owned or governed anyone can create (or own) Keynames, however people should be aware that anyone and everyone is free to NOT implement Keynames, or even define a new Keyname that serves the same purpose. 

When defining a Keyname;

Try to be sure that it hasn’t been used already (ask if your not sure)

Try to be sure that there isn’t a Keyname already defined that serves the same purpose – or can you use an already defined Keyname instead?

Whilst someone can ‘redefine’ a Keyname in their client, that will return us to the same per client test from which (I believe) we should be trying to move away from (if ID=”this” then AVA=”that” else… type logic)

 

Defined Keynames

 

Keyname            Purpose                                  Example(s)

ID                    clientID                                    JAM – YHLT – Ymlite – Yzak (alphabetical order)

VER                 client Version                                    5.1.2

AVA                Avatar                                      REPOSITORYCODE\avatarname (see note below)

TM                  local time (hh or hh:mm) 01 – 23 – 5:15 – 02:33

LTIME local time&date (Tdatetime)                   38244.9497271528  (see below)

PROT              protocol                                   YCHT – CHAT2 – DHTML – YMSG

SUM                checksum                                A423EE1F (see note below)

SEX                 Sex                                          M F U N  (Male, Female ,Unknown and Neutral)

GOS                Munged text                            Secret codes… (see note below)

LOVE              the person/thing you love            “Mrs Troll”  -  Skiing  -   “making woopey”

GLY                Glyph                                       18x18 pixel single colour images (see below)

 

Where

 

The INF tag itself should be the first thing in the speech part of the chat room post.

When

If the addition of the INF tag to the chat post would cause the post to break a rule (for example it pushes the total length of the chat post over a server imposed limitation) then the INF tag should either be omitted in part (removal of less important data, I suggest a left to right, important to less important rule be adopted), or, the INF tag be completely omitted.

A more complicated rule could be used; data pairs only ‘need’ be re-transmitted if:

a)      Someone new has joined the room,

Or

b)      The value of the data has changed

However, this method requires the server to be reliable. ;-)

Further Notes

Remember, implementation of Keys are optional.

AVA

Format:            REPOSITORY\avatarname

REPOSITORY is a shorthand reference to a web-site where avatars are held for example:

CH\            http://www.chat-help.co.uk/~ymlite/identitar_pgg/avas/avatarname.jpg

CC\            http://avatars.cheetachat.com/avatar/avatarname.jpg

If avatarname does not include a file extension (.jpg .bmp .gif etc) then .jpg is assumed.  (in fact if the avatar is a jpg then the extension should not be included).

Or:

Http://the.repository.url/the/path/avatarname.ext

However, if you do not understand how this can be abused you should not implement it – or, disallow the option to download avatars from http:// style repositories by default, allowing the end-user to put their IP address at risk).

In this case the file extension SHOULD be included.  And any client downloading avatars should check that it is in fact downloading a valid graphics file (this includes usable filetypes, if you cannot display GIFs in your Trichedit why download it!), NOT the latest virus/exploit/Trojan etc.

Also, you may want to decide upon a ‘default’ I for example try to download from the chat-help repository if no repository prefix is included. Simply because the chat-help repository is maintained and the images ‘screened’ for suitability.

LTIME

This is the ASCI representation of a TdateTime (double floating point) - the integer part represents the number of days since the last day of 1899.  The fraction part is how far ‘through’ a 24 hour day – for e.g.  0.25 is 6am  0.5 is 12 midday and 0.75 is 6pm.

SUM

This can be used to verify the data that precedes the SUM key in the INF tag.

The value is a reduced MD5 of:

The YahooID of the sender

The Current Chatroom (including the colon and its number)

The contents of the INF tag up to and including the space char preceding the SUM key.

Reduction can be achieved by XORing byte quads (first byte of reduced checksum = byte1 xor byte2 xor byte3 xor byte4 of the original MD5 checksum, second byte of reduced checksum = byte5 xor byte6 xor byte7 xor byte8 etc etc) – reducing the checksum down to 4 bytes, which in turn can be represented as 8 hex chars (for eg SUM:a56231ff – do NOT use SUM$:a56231ff since that would cause the value to be parsed and reduced to 4 bytes rather that a 8 char string)

Any data following a SUM Key should be noted as ‘unverified’

Why? Because one day it maybe useful to verify the INF tag, yes it can be faked, but its more secure than a Key called “isthisINFok” and setting a value of 1 if it is ;-)

I chose MD5 purely because it is a well documented and easily verifiable one way hashing algorithm which is ‘just’ difficult enough to prevent all but the more die-hard spoofer.

GOS

GOS has no ‘set’ method of ciphering/deciphering the value (all clients could have their own method) – therefore it would be prudent to check the ID Key (if present) before deciphering the text, or, include a method checking the validity of the deciphered text).

GLY

GLY value represents an 18 by 18 pixel 1 bit Image.  Value will always be 55 bytes long (good, quick check for invalid data before you start decoding).

Bytes are encoded using the Y64 method – whereby:

Char     Value

.           0

/           1

0          2

1          3

2          4

3          5

9          11

A         12

B          13

Etc.

The full char set is:

./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

 

where ‘.’ is zero and ‘z’ is sixtythree (63).

The first byte of the value represents the foreground colour of the glyph, containing red green and blue values stored as 2 bits each (with the possible values of 0 to 3 each)

Firstly a method of converting from Y64 to a ordinal value (byte, integer etc) this function should except a single character and return an ordinal type, so you feed it the character ‘3’ and it returns the value of 5 – care should be taken to abort the decode if invalid (non-y64) characters are encountered) – below Y64toINT is function that performs this.

Decoding the colour information

EncodedCol=Y64toINT( First Byte of Value)

Red is stored in bits 4 and 5

Green is stored in bits 2 and 3

Blue is stored in bits 0 and 1

In binary EncodedCol looks like:

 Bits:     7            6            5            4            3            2            1            0

            .            .            R            R            G            G            B            B

so:

red=(EncodedCol / 16) and 3

green=(EncodedCol /4) and 3

blue=EncodedCol and 3

or

red=(EncodedCol >> 4) & 3

green=(EncodedCol >> 2) & 3

blue=EncodedCol & 3

or

red=(EncodedCol shr 4) and 3

green=(EncodedCol shr 2) and 3

blue=EncodedCol and 3

red, green and blue will now hold a value in the range 0 to 3 – to give you a true colour, 24 bit version simply multiply red, green and blue values by 85 – they then will be in the range 0 to 255

 

the actual image itself now occupies the remaining 55 bytes, which comprises of 18 rows of 3 bytes each

the second, third and fourth byte represent the first scanline of the image, so for eg

if the 2nd, 3rd and 4th chars are “..z” (not including quotes)

then the values gained from the (Y64toINT function) would be 0, 0 and 63 respectively

which when represented in binary gives us: 000000 000000 111111  (the top scanline of our image)

the remaining 17 scanlines are stored in the same way. 

Below is the first 7 scanlines of an image and its respective values and Y64 values

ScanLine            1st byte             2nd byte            3rd byte             decimal values               Y64 chars

0                      000000            000000            111111                        0,0,63                          ..z

1                      000000            000000            011111                        0,0,31                          ..T

2                      000000            000000            001111                        0,0,15                          ..D

3                      000000            000000            011111                        0,0,31                          ..T

4                      000000            000000            111011                        0,0,59                          ..v

5                      000000            000001            110001                        0,1,49                          ./l

6                      000011            111110            000000                        3,62,0                          1y0

the remaining 21 Y64 chars would all be “.” For  0   (no forground image)

Example for decoding image (in Pseudo-code)

All variables are of type integer (at least 8 bits wide)

Let x=0

Let y=0

Let xc=0

Let i=2

Let B=Bitmap(size 18 x 18 pixels)  //Clear B to backgroundColour

:MainLoop

Let c=Y64toINT(value[I])  

//where value[I] returns the Ith character (1=first char, the colour byte) of the GLY

if c=0 then Let x=x+6   GOTO SkipInnerLoop

Let xl=0

:InnerLoop

if (c and 32)=32 then Let B.pixel[x,y]=forgroundcolour

Let c=c shl 1

Let x=x+1

Let xl=xl+1

If xl<6 then GOTO InnerLoop

:SkipInnerLoop

Let xc=xc+1

If xc<3 then GOTO SkipNewRow

Let xc=0

Let x=0

Let y=y+1

:SkipNewRow

Let I=I+1

If I<56 then GOTO MainLoop