In order for letters and other characters to be stored on computer systems this information must be encoded as binary codes. This is because computers can only actually store 1s and 0s which relate to high and low voltages. Agreed set of codes are used to represent all characters, the most important being standard 7-bit ASCII (American Standard Code for Information Interchange) which 128 possible combinations of 1s and 0s, in addition to Unicode which is 16-bit and as such has 2 to the power of 16 or 65536 combinations.
When using ASCII the binary representations of actual digits (0-9) is different to how those numbers would actually be represented in binary if they were being operated upon or stored. For example, the number 1 would usually be 0000001 in binary, but when being represented as a character it is 0110000 (48).
Some forms of ASCII may have an extra digit used for an extended character set (256 characters), many of which are used for various symbols, mathematical operations and letters from other languages. The extra digit (to make it 8-bit) is most often used for error checking however which is important for receiving text correctly.
ASCII is a very limited character set even with 256 different characters, as it only contains letters from European languages including English and French which use the Latin alphabet originating from the Romans. As such there is no way to represent letters from languages such as Arabic or those based upon the Cyrillic alphabet. So these characters can be represented Unicode must be used. It is 16-bit, and as a result this is sufficient for all characters from all languages, plus mathematical and other symbols.
There are two main types of error checking which are even bit checking, and odd bit checking. These are used because just one incorrect binary digit per character can result in the characters in a passage of text changing and it not being understandable. Both rely upon the use of the 8th bit. In odd bit checking the number of 1s must always be odd, and as such if the 7-bit ASCII code has an even amount of 1s then the check bit will be a 1, making the number of 1s odd. If the number of 1s is not odd when received it will be assumed that the data has been sent incorrectly, or has been interfered with and as such will be requested again.
Even-bit checking is the opposite, as the number of 1s must be even and the check digit is used to achieve a similar outcome, and if the number of 1s is odd when received it will be considered incorrect. Even bit checking is potentially flawed, as if two bits were flipped then the fact that it is incorrect would not necessarily be detected once the code was received.
Error checking refers to superior techniques that may be used in addition to or instead of error detection techniques. The Majority Vote method is the most common of these, which is a process during which a code is sent three times, and the most common value for each bit is used. Therefore if two out of three codes show a 1 in the 4th position, but the third code shows a 0 then it will be assumed that this has been corrupted, and that the correct digit is a 1. This can be used to correct numerous errors, and its only flaws are that a large amount of data has to be sent, in addition to the fact that if a large amount of the codes sent were corrupted then it could receive an incorrect majority, although the chances of this happening are low.
There are two other error correction methods known as Hamming code and Gray code. Hamming code utilises positions of powers of 2 as parity bits, and the other integers as locations for data to be encoded. Depending upon the position of the parity bit, different patterns of checking and skipping are used. The checking method with position n can be summed up however with the checking pattern skip n - 1 bits, then in turn check n bits and skip n bits. It has a number of sets of parity bits (often 4), and in each set there will be an odd number of binary digits initially, but an even parity will be used. As such a check digit is included in the same manner as previously mentioned to make sure the number of 1s is always even.
With Gray code every number in a specific range can be represented, and as such there are no 'in-between' values. It is also considered to be more reliable than standard binary because there is less likely to be an error due to a slow response time. Using Gray code only one digit is changed at a time, whilst when using binary multiple digits could. Gray code counters also use less energy than their binary equivalents. One of their key uses is for passing information between microchips that operate on different frequencies.
Well done - a very thorough summary. You could improve it with an image of the ascii table. Well done for having a look at Hamming and Gray as well
ReplyDelete