MAP-T

Index

  1. Introduction
  2. Foreword
  3. Thought Process
  4. The MAP Address Format
    1. End-user IPv6 Prefix
    2. Rule IPv6 Prefix
    3. EA-bits
    4. Subnet ID
    5. Interface ID
    6. 16 bits
    7. IPv4 address
    8. PSID
  5. CE Configuration
  6. CE Behavior
  7. BR Configuration
  8. BR Behavior
  9. Additional Configuration
    1. The a, k and m configuration variables
    2. FMR on the CE

Introduction

This document is a layman’s (but exhaustive) slightly sardonic explanation of MAP-T. It is intended to serve as a replacement for RFCs 7597 and 7599, or at least, as preparatory reading for them. I’m assuming you’ve already consumed the general introduction to the topic, so you know what you’re getting into.

Warning! Please be aware that Jool does not yet implement MAP-T. (Support will be added in version 4.2.0.)

In any case, this particular document does not deal with Jool in any way. (That’s the tutorial’s job.)

Expected background knowledge:

  • IPv4 addresses
  • IPv6 addresses
  • Hexadecimal and binary numbers

Foreword

The MAP RFCs argue that, depending on how many IPv4 addresses you have, and how many you’re willing to assign to each CE, there are three different MAP-T scenarios:

  1. You have less IPv4 addresses than CEs, so your CEs will have to share IPv4 addresses.
  2. You have the same number of IPv4 addresses as CEs, so each CE will have one IPv4 address.
  3. You have more IPv4 addresses than CEs, thus you can assign more than one IPv4 address to each CE.

In my opinion, the first scenario is the only one that truly makes sense. (If you have that many IPv4 addresses, I think SIIT-DC-2xlat would be a simpler alternative to MAP-T.) However, I will walk you through all three of them, in the hopes that some variety migh facilitate the “aha!” moment. (TODO: I still haven’t done scenarios 2 nor 3.)

First, let’s take a look at scenario 1.

Thought Process

In order to define your MAP-T network, you first need a general idea of how you’re going to distribute your available public transport addresses.

Suppose you have the entirety of block 192.0.2.0/24 to distribute among your CEs. Suppose, as well, that you have 5000 customers.

Let’s define some variables:

  • r = Length of the IPv4 prefix (Defined by RFC 7597)
  • p = Length of the IPv4 suffix (Defined by RFC 7597)
  • a4 = Number of available IPv4 “a”ddresses
  • c = Total “c”ustomers
  • c4 = “C”ustomers per IPv”4” address

Yes, I’m also very upset by the RFC’s choices.

In our example,

Equations: 1

Note Not sure if I should be explaining this, but the “⌈⌉” operator means ceiling.

As you can see, each address needs to be divided into 20 “Sets” of ports. (But MAP-T likes powers of two, so we’ll have to round that up to 32.) We will assign each set to a different customer. (And leftovers will be reserved for a future growth of our customer pool or whatever.)

  • S = Number of “s”ets per IPv4 address
  • P = Number of “P”orts per set

Equations: 2

So, we will divide each address into 32 sets of 2048 ports each.

Warning! The following is an oversimplification that assumes a = 0 and m = 11. Don’t worry about this for now; a and m will be explained later.

Also, take heed of the upcoming multicharacter variable. We’re dropping the formal pretenses now.

The first port of set PSID is P * PSID, and its last port is P * (PSID + 1) - 1. (Where PSID = { 0, 1, 2, 3, …, S - 1 }.)

In English:

Port Set #
(aka. “Port Set Identifier,” “PSID”)
First Port Last Port
0 0 2047
1 2048 4095
2 4096 6143
3 6144 8191
30 61440 63487
31 63488 65535

With that in mind, I would like to introduce the notion of Embedded Address bits (“EA-bits”). It’s basically a CE identifier. (In fact, I wish it were called that, but I don’t make the rules.) It’s composed of a concatenation of the suffix of the IPv4 address that has been assigned to the CE, as well as the identifier of its Port Set. We need p bits for the suffix, and q = log2(S) bits for the PSID. In our example, that would be p = 8 and q = 5:

Diagram: EA-bits

As my wishful name implies, each CE has a unique EA-bits number.

Note! The general introduction used to refer to EA-bits as “slice ID.”

Note! Only scenario 1 includes PSID. Port Sets only need to exist if the IPv4 addresses are being shared.

Let’s visualize all of that. Please don’t stop staring at this picture until you’ve understood the relationship between each CE’s (hexadecimal) number and its assigned IPv4 address and (decimal) PSID:

Network: EA-bits distribution

Warning! The RFCs define a rather important notion called “MAP domain,” whose meaning is unfortunately significantly inconsistent across the specification. (Probably as a result of its evolution as the documents were written.)

For the purposes of this documentation, I’ve decided to go with the meaning that makes the most sense to me:

The diagram pictured above represents exactly one MAP domain. It’s a group of MAP devices (CEs and BR) that share a common essential configuration known as the Basic Mapping Rule (BMR).

Stick to the diagram for now; I will properly define the BMR later.

Once you’ve designed your own version of that, you’re ready to start assigning IPv6 prefixes to the CEs.

The MAP Address Format

Remember when I lied? Well, here’s the full IPv6 address format defined by the MAP proposed standard:

Diagram: MAP Address Format

Though these are part of the CE configuration, they are actually used to mask the IPv4 island clients. (The address you will assign to the CE’s IPv6-facing interface is a separate–and completely normal–IPv6 address.)

There’s a fair bit of information encoded in the MAP address, which might help you understand and troubleshoot your network. Therefore, here’s an explanation of every field:

End-user IPv6 Prefix

The CE’s unique prefix. All the traffic headed towards this prefix needs to be routed by the network towards the corresponding CE. It is interesting to note that, unless you’re on scenario 3, this is actually the only technically meaningful part of the address; everything else is essentially cosmetics.

Rule IPv6 Prefix

This is just an arbitrary prefix owned by your organization, reserved for CE usage. (All CEs sharing a common MAP domain will have the same Rule IPv6 Prefix.)

Way I see it, if your organization owns 2001:db8::/32, you might for example assign something like 2001:db8:ce::/51 as your “Rule IPv6 prefix.” Each of your CEs would need to pick a subprefix (ie. the End-user IPv6 Prefix) from 2001:db8:ce::/51 to operate.

(These are just examples. Both the Rule IPv6 Prefix and the End-user IPv6 Prefix are technically allowed to span anywhere between 0 and 128 bits, so you can pick lengths that make more sense for your network.)

EA-bits

The CE’s unique identifier. (See Thought Process for the rundown.)

In scenario 1, EA-bits is actually two subfields glued together: the IPv4 address suffix and the PSID. In the other scenarios, EA-bits only contains the IPv4 address suffix.

This field is allowed to length anywhere between 0 to 48 bits. (32 bits for a full IPv4 address plus 16 for an entire port as PSID.)

Subnet ID

The trailing bits required to assemble a full IPv4 address in scenario 3.

(This field only exists in scenario 3, so ignore it for now.)

Interface ID

I’m guessing the length of IPv6 addresses left the MAP designers with too many surplus bits, and they decided to grant pointless purpose to the leftovers instead of leaving them in reserved status.

The Interface ID is just redundant data. It’s so unnecessary, in fact, that the End-user IPv6 Prefix is allowed to length up to 128 bits, and in order to accomplish this, it unapologetically overrides the Interface ID bits. (So, even if I stated in the diagram that the Interface ID lengths 64 bits, some of its leftmost bits might be chopped off.)

My guess is that this field only exists so that, given a MAP address, you can visually locate the CE’s public IPv4 address and PSID without having to analyze the EA-bits. (Assuming the former haven’t been chopped off.) (And you’ll still need to mentally convert the IPv4 address from hex to decimal.)

Note! Because they can be truncated, Jool doesn’t do anything with any of the Interface ID’s subfields. They simply exist. (Or not.)

Without further ado, the Interface ID is composed of three subfields:

16 bits

Just padding; sixteen zeroes with no meaning.

IPv4 address

Basically the full IPv4 address from which we extracted the EA-bits’s IPv4 address suffix subfield.

It’s also the public side address of the CE’s NAPT.

PSID

The CE’s PSID again, right-aligned and left-padded with zeroes for your viewing convenience. (I guess.)

CE Configuration

Note! Please note that, in this context, “CE” is used to refer to the translator mechanism exclusively (ie. Jool). The NAPT is assumed to be a separate tool, configured independently.

In addition to usually requiring a NAPT to really make sense, a formal minimal CE configuration contains

  1. The End-user IPv6 Prefix
  2. A Basic Mapping Rule (BMR)
  3. A Default Mapping Rule (DMR)

(More configuration parameters are offered by the standards, but we’ll get to them later.)

CEs sharing a MAP domain will always have the same BMR, and usually the same DMR too. The End-user IPv6 prefix is the only important configuration-wise distinction between them.

For some reason, the RFCs insist that “Mapping Rules” are always triplets of the following form:

{
	<IPv6 Prefix>,
	<IPv4 Prefix>,
	<EA-bits length>
}

This is not really true, but we’ll play along for now.

Let’s define those Mapping Rules:

BMR

Warning! Because the definition of the BMR is intrinsically tied to the concept of a “MAP domain,” the BMR is also inconsistent across the RFCs. Once again, the definition presented here is my preferred one.

The Basic Mapping Rule is a MAP domain’s common MAP address configuration. Basically, this field is the essential piece of configuration that allows the translator to assemble MAP addresses out of IPv4 addresses, and viceversa.

It refers specifically to addresses that will be governed by the MAP address format, not the RFC 6052 address format. Again, the BMR defines the base MAP address configuration that all CEs share, while the End-user IPv6 prefix describes the additional MAP address specifics that belong to one particular CE.

Here’s what each of the triplet fields stand for in the BMR:

{
	<Rule IPv6 Prefix>,
	<IPv4 prefix reserved for CEs>,
	<EA-bits length>
}

The “Rule IPv6 Prefix” is the same one defined above. The “IPv4 prefix reserved for CEs” is exactly what it sounds like (192.0.2.0/24 in the example). The “EA-bits length” is the total length (in bits) of the EA-bits field.

So what does this do? Well, the suffix length of the IPv4 prefix reserved for CEs (p, as defined above) and the EA-bits length (o) describes the structure of the EA-bits, and the Rule IPv6 Prefix length describes their offset. If we define r as the length of the IPv4 prefix reserved for CEs,

  • If o + r > 32, we’re dealing with scenario 1. (q > 0)
  • If o + r = 32, we’re dealing with scenario 2. (q = 0)
  • If o + r < 32, we’re dealing with scenario 3. (q = 0)

In our example, the BMR would be

{
	2001:db8:ce::/51,
	192.0.2.0/24,
	13
}

Which, in turn, will yield MAP Addresses that have the following form:

Diagram: MAP Address Example

Again, for context: These address will represent devices on the IPv4 customer islands. (ie. Behind the CEs.)

DMR

Default Mapping Rule is just a fancy name for pool6. It’s the “default” prefix that should be added to an outbound destination address so the packet is routed by the IPv6 network towards the BR (and therefore, towards the IPv4 Internet). It has the following form:

{
	<pool6>,
	<unused>,
	<unused>
}

Yes, defining this as a “Mapping Rule” triplet is a stretch. Code-wise, it doesn’t even make sense to implement it as one.

In our example, the DMR would be

{
	64:ff9b::/96,
	<unused>,
	<unused>
}

Again: Addresses masked with the DMR will represent devices on the IPv4 Internet. (ie. Behind the BR.)

CE Behavior

When one of the CE’s clients makes an outbound request, the CE uses the BMR to translate the source address, and the DMR to translate the destination address.

Packet flow: CE outbound

Here’s the breakdown:

The opposite happens in the other direction:

Packet flow: CE inbound

BR Configuration

The BR only needs two things:

  • A Forwarding Mapping Rule (FMR) table
  • The Default Mapping Rule (DMR)

The FMR table is a bunch of BMRs. One BMR per connected MAP domain.

In our example, the FMR would only have one entry:

IPv6 Prefix IPv4 Prefix EA-bits length
2001:db8:ce::/51 192.0.2.0/24 13

The DMR is, once again, pool6.

{
	64:ff9b::/96,
	<unused>,
	<unused>
}

BR Behavior

Packet flow: BR outbound

Source is translated by FMR, destination by DMR.

Packet flow: BR inbound

Source is translated by DMR, destination by FMR.

Additional Configuration

If you’re curious to get some hands-on experience, by now you should have the fundamentals required to know what you’re doing if you set up your own MAP-T scenario 1 environment with Jool.

Additional bells and whistles follow:

The a, k and m configuration variables

Ok, so this is a bit of a doozy because the Linux kernel is not terribly well-equipped to deal with these variables, but I’ll explain them nonetheless.

If you were paying close attention, you might have noticed in the example above that, even though we happily assembled 32 port sets, one of them is actually unusable: Port Set zero. Why? Because it contains the “taboo” ports: 0-1023.

To me, personally, this is not a big deal. You just refrain from using Port Set 0 and go eat some cookies. Or, you can set up the NAPT owning Port Set 0 to only use ports 1024-2048 (instead of 0-2048). (You’d assign that particular port set to low-traffic CEs.) But I guess the IETF wasn’t having any of that, and decided to optimize the problem away. It’s optional, but also some definition of “recommended.” You’ll get one extra full port set at the expense of some complexity. You do you.

To understand the solution to the problem, you need to internalize how MAP-T divides the port space. Let’s take a look at this table again, and add some binary representations:

PSID First Port Last Port
010 (000002) 010 (00000 000000000002) 204710 (00000 111111111112)
110 (000012) 204810 (00001 000000000002) 409510 (00001 111111111112)
210 (000102) 409610 (00010 000000000002) 614310 (00010 111111111112)
310 (000112) 614410 (00011 000000000002) 819110 (00011 111111111112)
3010 (111102) 6144010 (11110 000000000002) 6348710 (11110 111111111112)
3110 (111112) 6348810 (11111 000000000002) 6553510 (11111 111111111112)

See a pattern? Well, the first port always ends in pure zeroes, and the last port always ends in pure ones. But, even more critically, the first q bits of the port number are always its PSID.

Therefore, we can think of a port number as a 16-bit field which can be subdivided into two separate pieces of information:

Diagram: Port Number - 2 fields

The first field tells you which subdivision (“Set”) of the port space the port belongs to, and the second one tells you that port number’s index within that group:

Diagram: Port Division - 2 fields

The taboo ports have a similar quirk. 0-1023 happen to be exactly the ports whose first 6 bits are all zero:

Diagram: Port Number - Taboo

So that’s where we’re at. By excluding PSID zero, we effectively also exclude the taboo ports. But we don’t want to exclude PSID zero. What do?

The solution is to add a third field to the port number:

Diagram: Port Number - 3 fields

Warning! Just a heads up: I more or less made up “Port Block” and “Port Index.” The RFC sort of uses them, but not in a formal capacity. “Port Block” is actually called A (though it’s sometimes referred to as i), and “Port Index” is called j.

And by the way: Those are the actual values. The lengths of these values are a, q and m.

Note! In our example, a = 6, q = 5 and m = 5. However, they can be whatever non-negative numbers you need them to be, as long as a + q + m = 16.

The result is a distribution that looks as follows. Each port number is the result of the binary concatenation of its block, then its set, and then its index:

Diagram: Port Division - 3 fields

What have we accomplished with this? Instead of excluding PSID zero, we now exclude Port Block 0. In other words, instead of excluding half of the ports from the first PSID, we exclude the first 2m ports from every PSID. And now all PSIDs are equal. (Each PSID has (2a - 1) * 2m ports.)

Per the three-subfield diagram above, a is the number of bits that will define the Port Block. (It defaults to 6, because that’s exactly the number of bits you need to exclude exactly the taboo ports.) q is whatever you need your Port Set ID to length (in accordance to your network needs; q = o - p). m is whatever remains of the port’s 16 bits.

Note! So what’s k, you ask? k is just a synonym for q.

And I know this section has gone for too long already, but there’s one more thing to say:

Remember when I said that, despite what the RFC says, the Mapping Rules aren’t actually triplets, and you assumed that I said it because the DMR has only one field? There’s actually another reason: Mapping Rules are actually 4-tuples. The fourth field is a:

{
	<Rule IPv6 Prefix>,
	<Rule IPv4 Prefix>,
	<EA-bits length>,
	<a>
}

The RFCs seem to be under the impression that a, k and m need to be instance-wide configuration parameters, but the problem is that it forces all the MAP Domains connected to one particular CE/BR to have the same a, k and m values. This might give you some headaches depending on how awkwardly arranged your available IPv4 addresses are.

I have a proof of concept that demonstrates that there is no technical reason to deal with that. Each MAP Domain should be perfectly able to have its own a, k and m, which is why Jool’s implementation includes a in both BMRs and FMRs.

Note! If you’re wondering why Mapping Rules need to define a but not k nor m, note that they already have an implicit k (k = q = o - p, o being the EA-bits length and p being the suffix length of the Rule IPv4 Prefix), and m is just 16 - a - k.

FMR on the CE

Warning! Under Construction.

The CEs also have an FMR table. When an outgoing destination address matches one of the FMRs, the FMR is used as translation method instead of the DMR. This allows the clients of CEs to communicate directly with the clients of other CEs, without having to use the BR as a middleman.

(Again, each BMR in the FMR table allows communication to a different MAP domain.)

In fact, a CE’s BMR is usually added to its own FMR table. This allows clients from a MAP domain’s CE to speak directly with other clients from the same MAP domain, but different CE.