Difference between revisions of "Format specifiers"
|Line 1:||Line 1:|
Revision as of 13:58, 31 October 2006
- 1 Introduction
- 2 Escape codes
- 3 Format specifiers
- 4 Other stuff
- 5 Overall example
When translating aMule, you might encounter with strange things. Sometimes this will be just typos, but sometimes they are there on purpose.
This document is a must read for anyone willing or actually translating aMule. It describes all the cases of groups of characters which should not ever be modified.
So, the following is a description of all the groups of characters which are not supposed to be modified and what they actually mean.
Non-representable ASCII codes
This are codes which represent characters of the ASCII codeset which aren't representable with the keyboard.
- \a -> This will normally cause an audible alert (sometimes visual) like a beep
- \b -> Will go back one character
- \f -> On most systems this will clean the screen
- \n -> Ends the current line and starts a new one, palcing the cursor at the begining
- \r -> Goes to the beggining of the current line
- \t -> Torizontal tabulation
- \v -> Vertical tabulation
- \<octal digits> -> Will display the value octal digits in octal
- \x<hex digits> -> Will display the value hex digits in hexadecimal
Disambiguation escape codes
The following are not characters non-representable on the keyboard, but due to limitations in the C++ programming language, are needed to be used this way:
- \? -> Displays a question mark ( ? ) to avoid trigraph translations (not all compilers support trigraph translating, so it's not allways necessary)
- \\ -> Displays a backslash ( \ )
- \' -> Displays a single quote ( ' )
- \" -> Displays a double quote ( " )
The following are some examples for the above escape codes. They are listed as couples of code-line + output. So, the first line represents the line in the way it is written into the C++ code and the second line (or group of lines if it needs more than one line) represents how that C++ code line is displayed on execution:
- Code line: "I have 6\b5 fingers in my right hand"
- Output: I have 5 fingers in my right hand
- Code line: "Where is the <RETURN> key???\nAh, here it is!"
- Output: Where is the <RETURN> key???
Ah, here it is!
- Code line: "I am a BIG lier\rI'm married with Marilyn Monroe"
- Output: I'm married with Marilyn Monroe
- Code line: "\141\115\165\x6C\x65"
- Output: aMule (Notice that the octal value of a in the ASCII codeset is 141, the octal value of M is 115, the octal value of u is 165, the hexadecimal value of l is 6C and the hexadecimal value of e is 65)
- Code line: "Isn\'t it complicated to use the \" and \\ characters\?"
- Output: Isn't it complicated to use the " and \ characters?
Basic format specifiers
Format specifiers are groups of characters which will be substituted with something else. The format specifier itself specified which type of data it will be substituted with:
- %d -> Decimal value (signed integer type). Equivalen to %i
- %i -> Decimal value (signed integer type). Equivalen to %d
- %u -> Natural number (unsigned integer type).
- %x -> Hexadecimal value represented with lowercase characters (unsigned integer type)
- %X -> Hexadecimal value represented with uppercase characters (unsigned integer type)
- %o -> Octal value (unsigned integer type)
- %f -> Rational number (number with a floating point) with the normal (showing all numbers) notation (both float and double types)
- %e -> Rational number (number with a floating point) with exponential notation using lowercase e (both float and double types)
- %E -> Rational number (number with a floating point) with exponential notation using uppercase E (both float and double types)
- %g -> Rational number (number with a floating point) with normal or exponential notation depending on the value. If exponential, a lowercase e will be used (both float and double types)
- %G -> Rational number (number with a floating point) with normal or exponential notation depending on the value. If exponential, an uppercase E will be used (both float and double types)
- %c -> A single character text representation (integer type)
- %s -> A string (array, pointer of integers type)
- %p -> Displays a memory addess (pointer type)
- %n -> The variable that is assigned to this format specifier will be given the value of the number of characters displayed up to know (integer type)
Sometimes, some characters can be inserted between % and the character representing the type of data. This insterted characters are meant to extend the information about the type of data the format specifier is going to be substituted with:
- h -> Will turn into short integer type. Valid for d, i, o, u, x, X and n.
- l -> Will turn into long integer type. Valid for for d, i, o, u, x, X and n.
- L -> Will turn into long double type. Valid for 'e, E, f, F, g, G
Also, some of the format specifiers allow to tweak a bt how they should be outputted. This tweaking codes must be inserted between % and the type character:
- - -> Aligns to the left
- + -> Prints plus ( + ) sign even when the number is positive. Valid for *d, i, e, E, f, g and G.
- 0 -> Fill the blank spaces with zeros ( 0 ) instead of spaces. Valid for *d, i, u, x, X, o, e, E, f, g and G.
- # -> It will act in different ways depending on the type of data:
- o': A zero ( 0 ) will be prepended when the data is non-zero.
- x and X: Prepends 0x or 0X to the data, depending whether the format specifier was x or X.
- f, e, E, g and G: Displays the decimal point even when the data is an integer (no decimals).
- g and G: The trailing zeros are not removed.
- <non-zero decimal value> -> Specifies the minimum width the data must occupy (if not all is occupied, it will be padded). Can be used together with 0.
- .<decimal value> -> It will act in different ways depending on the type of data:
- f, e and E: Specify the amount of decimals it is allowed to display (if .0, no decimals will be displayed).
- s: Maximum amount of characters to display (if .0, no characters will be displayed).
- %% -> This is not a format specifier, instead, it is only meant to be used to avoid ambiguousity. It will display a single % character.
You may NEVER change the order of the format specifiers, it would lead to undesired events! They must be in the same order in the translation, as they were in the original string.
Example: If you'd translate the string "I am %d years old and my name is %s." to "Meine name ist %s und ich bin %d Jahre alt." (notice that %s and %d are swapped), would probably cause aMule to crash when trying to display this string.
The following are some examples for the above format specifiers. They are listed as couples of code-line + output. So, the first line represents the line in the way it is written into the C++ code and the second line (or group of lines if it needs more than one line) represents. The data for which the format specifiers are being substituted is random (well, not really random, I've just set something meaning-full in each example):
- Code line: "I am %s and I am %d years old."
- Output: I am Jacobo221 and I am 19 years old.
- Code line: "The first letter in the english alphabet is %c"
- Output: The first letter in the english alphabet is A
- Code line: "There exists a format specifier which is %%%c"
- Output: There exists a format specifier which is %E
- Code line: "There exists a format specifier which is %%c"
- Output: There exists a format specifier which is %c
- Code line: "%E and %e are the same number"
- Output: 9.186329E+00 and 9.186329e+00 are the same number
- Code line: "%g is in normal notation while %g is in exponential notation"
- Output: 0.25 is in normal notation while 3.234234E+34 is in exponential notation
- Code line: "%+d is a positive number"
- Output: +5 is a positive number
- Code line: "%05f says: Am I not 0-plenty?"
- Output: 000.250000 says: Am I not 0-plenty?
- Code line: "Both %#o and %#X start with notation specifier"
- Output: Both 0345 and 0X65FC start with notation specifier
- Codeline: "%010x must be plenty of zeros"
- Output: 00000065fc must be plenty of zeros
- Codeline: "Pi number is %.2f"
- Output: Pi number is 3.14
There are other things you must have in mind when translating. So keep on reading.
The C++ programming language doesn't consider uppercase and lowercase letters as the same lettes. So, 'a' is a different letter from 'A' for C++ programs. In most cases, this will be insignificant for you, but when dealing with escape codes and format specifiers it does, so please remain them the same.
Leading and ending blank spaces
You will sometimes find strings with either begin or end with a space. This is not a typo. That space is there for a reason. Most usually this reason is one of the following:
- Appearance. Maybe it separates the string from somewhere that, without this separation, would look ugly.
- Future additions. In 99% of the cases, it will be because some data is going to be appended to it. Imagine you find a string like I am . Although no %s format specifier is found there, some data is going to be added later anyway. So, if the blank space wasn't kept, once translated, in the app it would look like I amme instead of I am me.
The following are examples of the above cases explained. They are unified into groups of four lines. The first line shows an original string, the second one how it should be translated (the examples are translated into spanish),the third one a possible output and the fourth line translates the translated possible output back into english so that you can understand it in case you don't speak spanish:
- Code line: " this line looks complete and the leading and ending spaces can be deleted because they seem typos "
- Translation: " esta frase parece completa y los espacios del principio y del final se pueden eliminar porque deben ser un error "
- Output: Aunque esta frase parece completa y los espacios del principio y del final se pueden eliminar porque deben ser un error no lo es porque, como se puede ver, quedaban cosas por añadir, lo cual demuestra que nunca se deben eliminar esos espacios
- Output (in english): Allthough this line looks complete and the leading and ending spaces can be deleted because they seem typos it is not because, as you can see, there were still words to be added, so this shows you why you should never delete those spaces
- msgid="I am %s and I a %d-year old\nand I\'m a happy \"aMule\" user "
- Would become (translation to spanish):
- msgid="I am %s, %d years old\nand &I\'m a happy \"rabitty-aMule\" user "
- msgstr="Soy %s y tengo %d años\ny soy un fel&iz usuario del \"conejillo-aMule\" "
- %s and %d must be copied literally since they will be substituted in the program with some string or number. General rule: anything between a character % and the next letter-character (that is, a, b, 'c, etc...) or percentage character (%) must be copied literally.
- \n must be copied literally too since it brakes the line into a new line. The general rule is: \ and it's very next character (can be \, a, n, t, ,f, ", ', etc) must be copied literally.
- & must be placed 'before the very same letter in the translation since it indicates the combination ALT+letter that will select that option.
- The ending space is left since in the original message it was there. Never remove startng or ending spaces, even if they look ugly. They are there for some reason. Nomally this will be because either before or after that string comes another string. For example: "Opening file " has a blank space at the end, so you can expect that right after it the name of a file will be displayed. Something like Opening file server.met