American Standard Code for Information Interchange
American Standard Code for Information Interchange, ASCII , alternative US-ASCII , is a 7-bit character encoding , and corresponds to the U.S. variant of ISO 646 and serves as a basis for future more bits based encodings for fonts .
The ASCII coding was on 17 June 1963 as a standard ASA X3.4-1963 and 1967 and published recently in 1968 (ANSI X3.4-1968) updated. The encoding defines 128 characters, consisting of 33 non-printable and 95 printable. The latter are, starting with the space :
The printable characters include the Latin alphabet in upper and lower case, the ten Arabic numerals and some punctuation marks and control characters . The character set is similar to that of a keyboard or typewriter for the English language . In computers and other electronic devices, text constitute, he is usually referred ASCII or backwards compatible ( eight thousand eight hundred and fifty-nine ISO , Unicode to store).
The non-printable characters output characters such as newline or tab, log -like character transmission is complete, or confirm and separating characters such as record delimiters.
Encoding
Each character is a bit pattern of 7 bits assigned. Because each bit can assume two values, there are 2 7 = 128 different bit patterns, also known as the integers 0-127 (hex 00-7F) may be interpreted.
In non-English languages used special characters – such as German umlauts – bit code can not be fully represented with the 7, it would require at least 8 bits. The data processing used in Rule 8 bit or a byte as the smallest unit for data storage. The most significant bit in each byte is ASCII using the value 0 is set at.
| Characters | Decimal | Hexadecimal | Binary |
|---|---|---|---|
| A | 65 | 41 | (0) 1000001 |
| B | 66 | 42 | (0) 1000010 |
| C | 67 | 43 | (0) 1000011 |
| … | … | … | … |
It can also use error correction ( parity ) communication lines or for other control functions to be used to. Today it is almost always extending ASCII to an 8-bit code used to. These extensions are largely with the original ASCII compatible , so all characters defined in the different enhancements of the same bit patterns are encoded in ASCII. The extensions vary depending on hardware and software and are country specific.
History
One form of the character encoding was the Morse code . He was with the advent of telegraphs the telegraph networks and forced out by the Baudot code and code-Murray replaced. From the five-bit code to the Murray-seven-bit ASCII, it was then only a small step – even ASCII was first American telegraph certain models, such as the Teletype ASR33 used for. In the early days of the computer age evolved into the standard ASCII code for characters. As an example, many were terminal ( VT100 ) and printer with ASCII only be controlled.
ASCII was originally the representation of characters of the English language. The first version, still with no lower case and with small deviations from the current ASCII was created in 1963. In 1968, the day before valid ASCII set then. later other languages to represent special characters can order (for example, German umlauts), adopted new codes with eight bits per character as ASCII-compatible basis. However, also offered an eight-bit code in which a byte is a character stood for, too little space to all the signs of human culture are simultaneously written to accommodate. Thus, several different specialized extensions necessary. There are also all for the Far East some ASCII-compatible encoding, either between code tables switch or on more than one byte for non-all ASCII characters need before. None of these eight-bit extensions “is” ASCII, because that means only the single seven-bit code.
For encoding characters in the Latin is almost exclusively mainframe computers incompatible with an ASCII encoding used ( EBCDIC ).
Composition
| Code | … 0 | … 1 | … 2 | … 3 | … 4 | … 5 | … 6 | … 7 | … 8 | … 9 | … A | … B | … C | … D | … E | … F |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 … | NUL | SOH | STX | ETX | EOT | ENQ | ACK | BEL | BS | HT | LF | VT | FF | CR | SO | SI |
| 1 … | DLE | DC1 | DC2 | DC3 | DC4 | NAK | SYN | ETB | CAN | EM | SUB | ESC | FS | GS | RS | U.S. |
| 2 … | SP | ! | “ | # | $ | % | & | ‘ | ( | ) | * | + | , | - | . | / |
| 3 … | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
| 4 … | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
| 5 … | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
| 6 … | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
| 7 … | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | DEL |
The first 32 ASCII character codes (from 00 to 1F) are for control character (control character) reserved; see there for the explanation of the abbreviations in the table above. They are characters, no characters represent, but to control such devices are used by (or used) that use the ASCII (such as printers). Control characters are eg the carriage return to newline or Bell (the bell), and their definition is historically justified.
Code 20 (SP) is the space (English space or blank ), in a text as empty words and delimiters used between and on the keyboard using the space bar is created.
The codes 21 to 7E are all printable characters of letters, numbers and punctuation characters (see table) included.
Code 7F (all seven bits set to one) is a special character as a delete character is referred to ( DEL ) . This code was used as a control character used to refer to paper tape or punch cards already punched characters subsequently a setting with all the bits, that is, by Auslochen all seven markers, delete can be – once existing holes can be more eventually not be undone. Areas without holes (ie, code 00) were found mainly at the beginning and end of a perforated strip ( NUL ) .
For this reason, only 126 belonged to the actual ASCII characters, because the bit pattern 0 (0 million) and 127 (1111111) met no character codes. The code 0 was later in the programming language C as the end of the string ‘interpreted, the sign 127 were assigned to different graphic symbols.
Extensions
ASCII does not contain diacritical marks to all the languages on the basis of the Latin alphabet are used in almost. The international standard ISO six hundred forty-six (one thousand nine hundred seventy-two) was the first attempt to address this problem, which led to compatibility problems, however. He’s still a seven-bit code, and because no other codes were available, were used some code in new versions.
For example, the ASCII position 93, the right to square bracket (]) in the German character set variant ISO 646-DE by the big U with umlaut dots (R) and the Danish variant of ISO 646-DK by the large A with ring ( Krouzek ) (a) ‘. When programming staples had to be used in many programming languages the square by the respective national characters are replaced. This reduced the readability of the code and often led to unintended comic results by about the startup of the Apple II “APPLE] [" from "APPLE ÜÄ" mutated.
Several manufacturers developed their own eight-bit code. The code page 437 code has long been called the most widely used, he came to the IBM-PC under English MS-DOS , and is still in the DOS window by English Microsoft Windows used. In the German installations, since MS-DOS 3.3, the Western European code page 850 is the default.
Even in later standards such as ISO 8859 , eight bits used. Here are several variants, such as ISO 8859-1 for the Western European languages. German-language versions of Windows (except DOS window) to use the ISO 8859-1 encoding anabolic Windows 1252 - see, for example, text files created under DOS, the German characters from wrong, therefore, if you look under Windows.
Many older programs that used the eighth bit for their own purposes, could not handle it. They were often adjusted over time the new requirements.
To meet requirements of different languages to be, the Unicode (his character repertoire in identical with ISO 10646 ) developed. It uses up to 32 bits per character, and could be four billion different characters differ is over, but at about one million allowed code points limited. This can all previously used characters are represented by people, unless they were taken to the Unicode Standard in. UTF-8 is an 8-bit encoding of Unicode that is backward compatible with ASCII. A character can be one to four 8-bit words are taking. Seven-bit versions must not be used, but can also use Unicode UTF-7 bit encoding, seven in. UTF-8 evolves (2011) for a uniform standard on most operating systems. The users include Apple's Mac OS X and some Linux distributions UTF-8 and more and more websites are by default UTF-8 in created.
ASCII contains only a few characters that are generally binding, format or structure of text to be used, which went from the commands of the Telegraph shows. In particular these include the newline (line feed), the carriage return (carriage return), the horizontal tab , the feed (form feed), and the vertical tab. In typical ASCII text files can be found next to the printable characters usually only the carriage return or line feed to mark the end of the line, be it in DOS and Windows systems usually both used in succession, with older Apple - and Commodore computers (without Amiga ), only the carriage return and on Unix -like, and Amiga systems, only the line feed. The use of additional characters for text formatting is handled differently. For formatting text are now more markup languages such as HTML used.
Compatible character encodings
Many encodings are designed so that characters in the range 0 ... 127 use the same code as ASCII and the area over 127 more characters to use.
Fixed-length codes (selection)
Here is a fixed number of bytes for one character. In most codes this is one byte per character, per the East Asian fonts in two or more byte characters.
- ISO 8859 with 15 different character encodings to cover all European languages, Turkish , Arabic , Hebrew , and Thai
- MacRoman , MacCyrillic and other proprietary fonts for Apple Mac computers from Mac OS X.
- Windows and DOS code pages , Windows 1252
- KOI8-R for Russian, KOI8-U for Ukrainian
- ARMSCII-8 and 8a-ARMSCII for Armenian
- GEOSTD for Georgian
- ISCII for all Indian languages
- TSCII for Tamil
Variable-length codes
In order to be able to encode more characters, the characters are coded 0 ... 127 in a byte, other characters are encoded by multiple bytes with values of over 127th
- UTF-8 for Unicode
- Big5 for Traditional Chinese ( Taiwan , overseas Chinese)
- EUC (Extended UNIX Coding) for several East Asian languages.
- GB (Guojia Biaozhun) Simplified Chinese ( PRC )
ASCII table
The ASCII table contains all the codes of the ASCII character set, see control characters for the meaning of the abbreviations:
|
|
|
|
Filed Under Domain info | Post Your Comments