The Ultimate Question of Bases, Binary, and Everything

It has come to my attention that the terms and formats used when talking about RFID IDs among other things needs some clarification.

Well @amal
giphy (12)


To try and keep this balanced between detail and “brain dumping everything into a single run-on sentence” I have split this into vaguely structured sections. Each section will start with an overview, a simple explanation, and then dive into more details. I have also linked some resources that will be much better than my drivel.

Most of this knowledge is just sort of instinctual after working with this stuff for the last 15 years so I may inadvertently assume something is common knowledge… if you catch me doing that please point it out and I will attempt to elaborate. I would also like to point out that while I use this knowledge I am just a lowly self taught software engineer not a computer scientist or mathematician so sorry if I make some errors if someone spots them please correct me, I want to get better at writing so feedback is immensely helpful!


High Level Summary - TL;DR;

Digital logic, communication and memory work on a “simple” ON or OFF system. We can communicate and store data as a sequence of ON and OFF signals known as binary.

We refer to a single ON or OFF value as a bit and 8 bits make a byte.

One byte can represent 256 different values.

There are various ways to display this data, a common

Encoding is used to map these numbers to various data formats.

ASCII is a mapping between 1 byte and 1 character where a character is a leter, number, symbol or “control character”.

So if you have a 4 bytes of storage you can store:

  • a number from 0 to 4,294,967,295
  • 4 characters of text
  • anything else you can encode with 32 ON or OFF “switches”

Depending on the format and encoding this data may look quite different. These are all common ways to display the same data:

  • 1,214,852,405 (Decimal number)
  • 1001000011010010010110100110101 (Binary number)
  • 48 69 2d 35 or 0x48692d35 or 48:69:2d:35 (Hex number)
  • Hi-5 (ASCII)
  • 楈㔭 (UTF-16)
  • and many many more

Number Bases - There are 10 types of people

For most people numbers seem pretty universal…
101 is “one hundred and one” it’s simple… right?

TediousFinishedAdmiralbutterfly-size_restricted

What if I told you 101 could also mean “five” and “two hundred and fifty seven” among other things…

giphy (13)

Most people’s day to day use of numbers is in a system called decimal or base 10 but even if you have never thought about it we sort of use base 12 and base 60 number systems for telling the time… You can blame that on the Sumarians and their duodecimal (base 12) and sexagesimal (base 60) number systems.

There are many different number bases that are used but in the fields such as mathematics and computer science we usually only deal with a few, Base 2, Base 10, and Base 16 which I will focus on. Despite the focus on bases more relevant to this forum, the knowledge translates well to other bases if you ever come across them.

Talking about different number bases will inevitably lead to confusion… and I am already sick of typing out the numbers as words… so I will be using a subscript to identify the base of numbers I use. So “forty two” in base 10 will be 42_{10} although if I use a number without a subscript you may assume it is in base 10 as putting _{10} after all the numbers will just look messy.

Binary Numbers - Base 2

base 2 numbers, commonly known as binary numbers. This is a way of representing a number using only the 1 or 0 symbols. Ain’t nobody got time for 2, 3, 4, 5, 6, 7, 8, or 9. This is particularly useful when dealing with electronics as you can substitute 1 and 0 for ON and OFF, I will go over this in more detail in the “Digital Data” section below.

Counting in binary is just like counting in decimal, let’s start at 0: (as you should, none of this starting at 1 rubbish)
0_{2} = 0_{10}
1_{2} = 1_{10}
As you can see so far decimal and binary are no different… but what comes next we do not have the 2 digit in binary so we add another digit and keep going
10_{2} = 2_{10}
11_{2} = 3_{10}
We have exhausted the combinations we can do with 2 digits now so lets add some more
100_{2} = 4_{10}
101_{2} = 5_{10}
110_{2} = 6_{10}
111_{2} = 7_{10}
1000_{2} = 8_{10}

11111111_{2} = 255_{10}

As you may have picked up, because there are only two digit symbols available in binary, a binary number with only one digit can only represent two numbers, however this doubles for each digit you add so a two digit binary number can represent four numbers and a three digit binary number can represent eight numbers…

Each digit from right to left can be thought of as representing the base 2 to the power of the index (how many digits to the right is it) so
2^8, 2^7, 2^6, 2^5, 2^4, 2^3, 2^2, 2^1, 2^0
if we calculate what value each index represents we get
256, 128, 64, 32, 16, 8, 4, 2, 1
We can then just add up the numbers in each place to get the decimal representation of the binary number. Let’s give that a go
101010_2
Let’s take each digit that is 1 and add the value for that index together.

101010_2 = 2^5 + 2^3 + 2^1 = 32 + 8 + 2 = 42 =

the answer to the Ultimate Question of Life, the Universe, and Everything.
In other words you are just adding together the value of each digit that is ON.
42_{10} = ☒☐☒☐☒☐

Everything we have covered is from a general sense, we have only looked at positive integers (natural numbers including 0) but what about real numbers? This is probably beyond the scope of this post as not only am I lazy, but also I would not be able to do the topic justice… This information would likely not be of much use to members of this forum as RFID chip IDs are usually positive integers, but if it interests you hopefully these links help get you started:

Hexadecimal Numbers - Base 16

base 16 numbers, commonly known as hexadecimal numbers or simply hex. This is a way of representing a number using not only the symbols 0 - 9 but also A, B, C, D, E, and F giving us a total of 16 digits. Hex is often used in computing as 16_{10} is divisible two meaning that each digit maps exactly to four binary digits and this seems to be the most convenient split (at least for now, in the 1960s octal numbers or base 8 numbers were much more common, mapping to three binary digits)

If you read the binary numbers section you might see where this is going… Counting in binary is just like counting in decimal, let’s start at 0 again:
0_{16} = 0_{10}
1_{16} = 1_{10}
2_{16} = 2_{10}
You get the idea… let’s skip a few…
9_{16} = 9_{10}
This is where we run out of Arabic numerals but never fear, we can just add some letters
A_{16} = 10_{10}
B_{16} = 11_{10}
Again… you get the idea… let’s skip a few…
F_{16} = 16_{10}
Now just like the other positional numeral systems we simply add another digit.
10_{16} = 17_{10}

FF_{16} = 255_{10}

Again each digit from right to left can be thought of as representing the base 16 to the power of the index (how many digits to the right is it) so
16^3, 16^2, 16^1, 16^0 or 4096, 256, 16, 1
But unlike binary there is not simply ON and OFF so we have to multiply by the digit symbols value then add them all together. Let’s give that a go
2A_{16}
Let’s turn this into an equation and solve it

2A_{16} = (2_{16} \times 16^1) + (A_{16} \times 16^0) = (2 \times 16) + (10 \times 1) = 32 + 10 = 42 =

the answer to the Ultimate Question of Life, the Universe, and Everything.

Identifying Bases - What are we dealing with?

In a relationship, the commonly accepted meaning of getting to 1st base means kissing / snogging / making out… Oops wrong type of bases…

Bases are perhaps the most important element of a baseball field. There are four bases: home plate, first base… Damn I did it again… Sorry sorry…

As I said in the introduction to this section

Well hopefully you now understand that a set of digits can mean different things depending on the base you use… so that the digits 101 can represent a bunch of different values for example:

101_{2} = 5_{10}
101_{3} = 10_{10}
101_{8} = 65_{10}
101_{10} = 101_{10}
101_{16} = 257_{10}
101_{42} = 1765_{10}

Unfortunately this brings up an issue, how do you know what base is being used?

For some things you can make an assumption like if I see a number that is 863D1 for example I will assume it is hex. Fun fact: BAD is a valid hex number… there are a few words like this (see here)

To address this a number of prefixes and suffixes have become common such as the _2, _{10}, and _{16} system I am using in this post.

For hex a prefix of 0x or a suffix of h for example 0x2A or 2Ah, I prefer 0x as I find the suffix is easy to confuse with a unit. Hex data is also often written split into 2 wide blocks with a space or : seperating

For binary there are many a prefix pf 0b is the most common one I have seen but here is a longer list.

Keep in mind as with anything people forget or do not bother to label things so often things will not be clear. For example the highlighted value is the hex representation of the tag ID but by luck of the draw no A-F characters so someone not used to that interface is not in for a fun time…

Converting Between Bases - What you need to know?

There are hundreds of ways to do this, I like just using a site like this or even the calculator on Windows if you swap to the programer mode (for those on .

giphy (14)


Digital Data - Bits, Bytes, and Nibbles!

Now that we have a basic understanding of how binary and hex numbers work in theory we can look at the practical elements of modern data storage as it applies to RFID chips and other storage and communication mediums.

Digital circuitry works on “HIGH” and “LOW” logic signals, essentially ON and OFF.
Digital memory uses memory cells that are essentially switches.

A bit is a single binary digit.
A nibble is four bits or one hex digit.
A byte is eight bit or two nibbles.

Most people talking about data sizes will use bytes… kB, MB, GB, etc… or KiB, MiB, GiB…

Mini rant about Metric, IEC and JEDEC standards…

Basically the issue is the “byte” is not a SI unit…
This has resulted in GB being used to represent both the 1000^3 (Metric) and 1024^3 (JEDEC) standards… This is the reason why your 16GB flash drive shows up as 14.9GB on Windows computers as Windows follows the JEDEC standard but most flash drive manufacturers use Metric.
The IEC standard was meant to help separate the 2 units so that GB = 1000^3 and GiB = 1024^3 but that did not really take off, I remember the first time I used IEC labeling in a software application, we got so many people pointing out that we had “mistakenly” added a “i” between our units that I was told to remove it… FML…

source (1)

So to look at some practical examples:

The xNT/NExT implant has a NTAG216 chip inside. This has 888 bytes of available memory or 7104 individual bits that you can configure. It also has a 7 byte or 56 bit UID (Unique identifier) that is not changeable. Along with some other bits and pieces (sorry I couldn’t help myself) for config.

The Wiegand interface used by many RFID readers has different bit lengths, the “standard” format is 26 bit although there are others. It is called “26 bit wiegand” because when a card is read it sends a 26 bit representation of the ID of the card.

In this image you can see the scan of a tag on a phone, the last block shows the memory content in hex where each two hex digits is 1 byte or 8 bits.


Encoding - Data vs Digits vs Characters?

The thing about all this binary number stuff is most people do not want to store or send only numbers. With implants for example you may want to store a website URL in the user data.

This is where encoding comes in. There are various encoding standards, some more standard than others. For this post I am just going to talk about text encoding, specifically ASCII. Essentially we will be looking at how computers store text as a number.


Say you have this binary “number”

10001000110000101101110011001110110010101110010011011110111010101110011001000000101010001101000011010010110111001100111011100110010000001101001011100110010000001000011011011110110111101101100

As a decimal number it would be

1,676,687,209,678,522,474,642,500,958,268,426,250,008,928,284,490,152,767,340

So not that useful… each byte of data in this case is a ASCII character. A character is a number, letter, symbol or any number of “special characters” like the TAB character. There are 256 possible ASCII characters because there are 256 possible values that you can represent with a single byte but only 128 are used in ASCII, the full ASCII set is called Extended ASCII.

Now you have the table you can convert that number into characters

Here are the bytes in hex representation:

44 61 6e 67 65 72 6f 75 73 20 54 68 69 6e 67 73 20 69 73 20 43 6f 6f 6c

Which is Dangerous Things is Cool

To be 100% clear this is not the only way to store data. For example NDEF records that you store on NFC chips have types, one is known as “Absolute URI” but the type is not stored as text like that it is 03_{16} and then it could be an “http://” link or a “https://” link or even a “mailto:” link, instead of encoding this as text with ASCII storage is saved by assigning each link type it’s own value (URI ID Code) as you can see in this slide explaining NDEF TNF encoding.

Another encoding system that is quite commonly discussed in computer science is Huffman coding which is not relevant to this forum particularly but I am linking it because it is cool and gives some insight into how encodings are designed.


A Practical Example - The inspiration for this thread

This thread was requested at the end of this thread

All of the values listed in this image are the same number represented in different formats… the only way to work out what is what in this case was to go and read the code unless you had an intimate understanding of the labels already.

This is a great showcase of why new users struggle with the proxmark system…


Changelog
2020/04/25 - Initial Post

Author: @NiamhAstra

  • Layed out structure
  • Filled out pleniminary content
  • :crossed_fingers: that the math plugin comes out soon so it can be read correctly
2020/06/17 - Fix MathJax Formatting

Author: @NiamhAstra

  • Removed all the “More Details” sections as it was breaking the MathJax.
  • Removed warnings re missing MathJax plugin.

15 Likes

Normally I would try and avoid commenting on a wiki, Just like, subscribe and walk away.

I found myself need some information on an unrelated matter, but I remembered this Wiki you wrote.
Found exactly what I was after without having to search all of google for it
THEREFORE
Thanks @NiamhAstra , Some great learnings in there and knowledge expansion for me.
Sounds like you also answered everything @amal asked for , unless I’m missing something.
either way here is your xmas bonus from me :gift:
I hope Amals is a little more tangible and implantable :syringe: :wink:

2 Likes

200 (2)
My ego is about to go critical…

3 Likes