How does data sent over the internet know where to go?

OmegaMouse@pawb.social · 1 year ago

How does data sent over the internet know where to go?

darganon@lemmy.world · 1 year ago

There are things called routers that…route traffic. A dumbed down version is routers talk to other routers to find out what they know about.

If a game server you connect to matches you with someone in Japan, your computer sends a packet with the address in Japan attached to it. Your home router probably has no clue where that is, so it goes to its upstream router and asks if they know, this process repeats until one figures it out and you get a route.

This all happens very quickly, and it’s why people say the Internet routes around damage.

Atemu@lemmy.ml · edit-2 1 year ago

Your home router probably has no clue where that is, so it goes to its upstream router and asks if they know, this process repeats until one figures it out and you get a route.

That’s not how that works. The router merely sends the packet to the next directly connected router.

Let’s take a simplified example:

If you were in the middle of bumfuck nowhere, USA and wanted to send a packet to Kyouto, Japan, your router would send the packet to another router it’s connected to on the west coast*. From your router’s perspective, that’s it; it just sends it over and never “thinks” about that packet again.
The router on the west coast receives the packet, looks at the headers, sees that its supposed to go to Japan and sends it over a link to Hawaii.
The router in Hawaii again looks at the packet, sees that it’s supposed to go to Japan and sends it over its link to Toukyou.
The router in Toukyou then sends it over its link to Kyouto and it’ll be locally routed further to the exact host from there but you get the idea.

This is generally how IP routing works; always one hop to the next.

What I haven’t explained is how your router knows that it can reach Kyouto via the west coast or how the west coast knows that it can reach Kyouto via Hawaii.
This is where routing protocols come in. You can look up how exactly these work in detail but what’s important is their purpose: Build a “map” of the internet which you can look at to tell which way to send a packet at each intersection depending on its destination.

In operation, each router then simply looks at the one intersection it represents on the “map” and can then decide which way (link) to send each individual packet over.
The “map” (routing table) is continuously updated as conditions change.

Never at any point do routers establish a fixed route from one point to another or anything resembling a connection; the internet protocol is explicitly connectionless.

* in reality, there will be a few local routers between the gateway router sitting in your home and the big router that has a big link to the west coast

OmegaMouse@pawb.social · 1 year ago

That sounds like quite a messy and inefficient process! But I guess as long as it can be done quickly enough, it doesn’t really matter?

MelastSB@sh.itjust.works · 1 year ago

I think the previous comment omitted something, which is why you think it’s inefficient: routers don’t ask for directions every packet, they record the directions in their route table.

towerful@programming.dev · 1 year ago

At the back-bone scale of the internet, routers actually announce the addresses they are responsible for.
Paths are judged by how specific these announcements are. A router announcing a single IP is the preffered destination, compared to a router that announces a block that contains it. So routers will forward it to whichever router more accurately describes the destination IP.
This makes up part of the calculated Path Cost of various routes to reach a destination.
If router A tries to contact router D and knows that router B and C can both forward that packet, router A will send it to the router that announced the lowest path cost to D.

Its a lot more complicated than that, but that is how datacenters can disappear from the internet (by wrongly announcing they no longer have a path to the IPs inside the datacenter), or how a small ISP can accidentally route the entire internet through their network (by accidentally announcing extremely low path costs). Both of these have happened.
https://blog.cloudflare.com/october-2021-facebook-outage/
https://blog.cloudflare.com/how-verizon-and-a-bgp-optimizer-knocked-large-parts-of-the-internet-offline-today/

So, the internet is both fragile and resilient.
It can route around damage, but cannot deal with mistakes/maliciousness above a certain “ring” of control.

KillingTimeItself@lemmy.dbzer0.com · 1 year ago

So, the internet is both fragile and resilient. It can route around damage, but cannot deal with mistakes/maliciousness above a certain “ring” of control.

and this kids, is why we don’t like cloudflare, and DNS services.

glimse@lemmy.world · 1 year ago

I’m no expert but it seems like the most efficient way with the given technology! The hops between routers are much less frantic than (I think) you’re imagining.

To oversimplify, think of it like boxes in boxes where each box is a router.

Your PC is in the first small box. It says “I want to connect to [IP]” and the box says “I don’t have that IP, let me ask the bigger box”

The bigger box (your ISP) says “I don’t have it either, I’ll ask the big box”

The big box says “I don’t have it but based on the address, I know it’s in this other big box”

Other big box says the same thing and sends it to another small box. That small box has the PC you’re looking for and the packet is delivered!

KillingTimeItself@lemmy.dbzer0.com · 1 year ago

I’m no expert but it seems like the most efficient way with the given technology! The hops between routers are much less frantic than (I think) you’re imagining.

not just this, it’s also worth considering that laying cables is expensive, so you better damn well use them. A system like this also ensures a very wide range of pathing. And in turn, a very spread out use pattern.

wkk@lemmy.world · 1 year ago

https://www.khanacademy.org/computing/computers-and-internet/xcae6f4a7ff015e7d:the-internet/xcae6f4a7ff015e7d:routing-with-redundancy/a/internet-routing

I wouldn’t call that “messy and inefficient” but you do you. I’d be curious to know what’s a “clean and efficient” solution for you when it comes to routing packets around the planet :)

OmegaMouse@pawb.social · edit-2 1 year ago

Ah yeah this and @MelastSB@sh.itjust.works 's comment clarify the routing table thing. Before I was assuming they just blindly forwarded stuff until one router knows where to go, but if they have a rough idea from the IP address prefix that makes more sense.

towerful@programming.dev · 1 year ago

They dont have a rough idea, they have a very accurate picture of where they should send a packet based on the IP address.
Routers at the internet-backbone scale actually announce the IP addresses they are responsible for, as well as other routes (with an additional path cost added) that they can reach.
So, they match a destination IP to the most accurate IP block in their routing table (so a destination of 8.8.8.8 with 2 entries of 8.8.8.0/24 and 8.8.0.0/16, it will match the 8.8.8.0/24 route) and forward the packet to the router that announced it.

Routing at the internet scale is much smarter than routing at the home (even business) level

KillingTimeItself@lemmy.dbzer0.com · 1 year ago

it’s not efficient from the perspective of organization. But the thing nobody tells you here is that packets have no predefined route, they take whatever route gets them there optimally. So it’s highly redundant, and very fault tolerant. When you consider that, for what it does, it’s a highly efficient routing system.

To the point where you could cut an undersea cable, and traffic would still route perfectly fine, albeit probably a lot slower, assuming that isn’t your only connection of course. The fact that it works it all is kind of a miracle.

jordanlund@lemmy.world · 1 year ago

Oh boy! You’re one of the lucky 10,000!

Watch this! It’s only 10 minutes or so.

https://youtu.be/O7CuFlM4V54

PhobosAnomaly@feddit.uk · 1 year ago

Every time

sex.com 🔨

This is fine🔥🐶☕🔥@lemmy.world · 1 year ago

The almighty banhammer

KillingTimeItself@lemmy.dbzer0.com · 1 year ago

ah yes, a classic.

OmegaMouse@pawb.social · 1 year ago

Oh wow, this unlocked a memory! Pretty sure I watched back in school. Quite informative, though it felt like it skipped a lot between leaving the host computer and reaching the destination - is it just the same process over and over until it reaches the right place?

CondorWonder@lemmy.ca · 1 year ago

Yes, the packet passes through routers at each stage and they direct the packet to the ‘closest’ path based on its destination, until the final router has the destination on its network. This can happen a few times (for something in your ISP network), or 10-30+ times for something further away.

bfg9k@lemmy.world · 1 year ago

I remember watching this in high school IT class lol

brygphilomena@lemmy.world · 1 year ago

So at a basic level, well only talk about routers. Every computer/server on a network has an address. When your computer wants to talk to another it attaches the IP address of the destination computer to every piece of data that leaves your computer saying where that data wants to go.

It goes from your computer to your router which has a table of the addresses it knows (your network at your house) and then an address of another router that it sends everything that it doesn’t know.

It does this a few times before your data gets to a router that says “oh, I know a router that knows someone that knows where that is” and it sends it that way. Until it reaches a router that knows the specific computer to send it to.

Septimaeus@infosec.pub · 1 year ago

Packet headers.

A packet is like a sealed mailing envelope. Its headers are like things written on the face of an envelope, including an address. Chunks of data on the internet are so many letters in these envelopes, carried and delivered by a network of other computers.

DerArzt@lemmy.world · 1 year ago

To expand on this: every website that you go to online (i.e. www.google.com) is backed by an IP address.

some_guy@lemmy.sdf.org · 1 year ago

And the Domain Name (Google.com) get’s converted from words we understand to the IP address. This is the Domain Name System, or DNS. Everyone on the network agrees that Google.com equals 142.250.189.174. If that address changes, the change gets passed through the system until everyone agrees on the new IP address. DNS is how your computer learns the address.

thesmokingman@programming.dev · 1 year ago

The simplest explanation is that my computer doesn’t know where to go for everything but does know where to go to get answers. It sends its traffic to the place that will know where to send things. Rinse and repeat until you finally hit the place you wanted to go.

A more complete answer if you chase everything down is the traceroute manpage.

Zippy@lemmy.world · 1 year ago

Comments are correct here with one missing high level component for routers. That is the very top level routers are designed for tier 1. I started an internet company and we got large enough to decide to become a tier 1 provider. There is one big difference in this configuration is that we publish our own blocks of IPs and we listen for published IPs. We have routers that essentially maintained a list of where all the IPs or block of IPs worldwide needed to go. More importantly, I would send out a list of my IP blocks that would propagate across all the tier 1 routers across the world. That could take an hour but more likely minutes.

Having this allowed me to essentially connect to the internet at zero cost. There is some cost to be assigned IPs but I was trusted. While I say zero cost, I still had to pay for large bandwidth dark fiber to new York or other major meet me points. I also had to pay rack space to put a tier1 router into these buildings. But what is really gives me is the ability to have multiple connections to the pipes and because I publish my own IPs, I can balance all the routes and other providers can find the best way to me thru a process called weighing. Also if I loose a connection which is rare at this level, I could rapidly and automatically republish my route on working connections and usually within 15 minutes, all the routers in the world would know. 15 minutes actually is likely long. These days 5 minutes.

Now the interesting part of this, I publish my own IPs. I have to be extremely careful as with a single stroke, I could say I own all the IPs to China. Well likely a few strokes. I certainly could make a simple mistake and take control of a shit load of IPs. That means suddenly traffic could come to me that was destined for another country. More correct, because they are publishing, it would just make a mess and take some IPs down. If I publish a big block in China, I would essentially DOS myself because the pipe sizes I buy are factors smaller. Now this is a trusted system because we all connect together randomly. There is and can not be any central control as we all need to publish freely for this to work. But if I were to screw up and say divert a shit load of IPs destined to say Washington, it would rapidly be figured out and I would rapidly be determined to not be trusted. I would be shut down physically at some point.

Essentially I have fairly normal routers with one feature that allows them to dynamically keep track of all the routes worldwide and to periodically publish all the IPs I own.

LainTrain@lemmy.dbzer0.com · edit-2 1 year ago

I’d like to know this as well actually but on a physical level, i understand the TCP/IP stack well enough, but what is the circuitry that actually sends the light down the correct cable?

FuglyDuck@lemmy.world · edit-2 1 year ago

it doesn’t send it down the correct cable. It sends it on.

Imagine your friends. you need to talk to somebody. Lets call him Garry. You don’t know Garry’s contact info. So instead, you pull out your phone, and text Sally, asking her to ask Garry if he knows where your glasses are. Sally pretty much knows every one. Or at least, you thought she did. Reality is she sent to to Becky who sends it on to Steve. Now, Steve is the one who invited Becky to Garry’s party, and because… reasons, Becky invited Sally who invited you… so now, Steve relays the question to Garry.

Garry hasn’t seen your glasses, but, he does have a weird set of car keys with a giant Charzard key fob… maybe they’re yours? So, he sends his reply to Steve, which forwards it to Becky, who sends it to Sally, who giggles and asks if you really have a charzard key fob.

You get the idea. Only unlike people, the data usually doesn’t get mangled.

swab148@startrek.website · 1 year ago

Hey, Garry found my keys!

LainTrain@lemmy.dbzer0.com · 1 year ago

Fair! Thanks

batmaniam@lemmy.world · 1 year ago

I wrote up a whole thing that didn’t post. There’s good answers here but I think that, like me, you wanted a more “voltage based” one.

Short answer is they don’t. Everything on the network is always listening, and security is based solely off of a handshake. Everything is always employing a fancy multimeter that measures voltage high/low as a 1/0 turning it from bits to bytes etc. The router listens to that and decides where to send it upstream, which it isolates from downstream.

For a realllllly basic example look at the modbus protocol. That’s also why industrial equipment folks get real touchy about network access. For things like computers, theres talk back and forth to verify. Modbus is just “if the byte is the thing I do the thing”. But fundamentally, that’s the physical basis: all devices are always listening, the TCP/IP stack is what tells them what to disregard.

LainTrain@lemmy.dbzer0.com · edit-2 1 year ago

But surely that can’t really be true either like if I post a selfie on Instagram in London, some guy’s Minecraft server in Minnesota can’t be receiving that and be like “oh not for me - ignore”. It just seems horribly inefficient. But maybe I’m having trouble conceptualising how fast light is? 😅

And based on another answer ITT by FuglyDuck, it would seem that once you’ve resolved a domain you do send it to a central hub that then resolves subnets until it gets to it’s destination, so I can imagine that it does so by physically sending it down “the right cable” as it gets past each layer to get to the final destination via the recepient’s ISP, but imagining it as a giant automated telephone switchboard is all my feeble software brain can comprehend it as and that doesn’t seem right either.

~~Edit: well actually network switches do operate on the data link layer, but also not on the physical one?

I guess what I’m trying to say is: if I’m sending a packet to Japan from the UK - once my packet reaches a hub of a first tier ISP, does it just go down every oceanic cable in every direction, or the one that actually is in the direction of Japan?~~

The answer is that yes - the internet is just a telephone switchboard between what amounts to otherwise isolated networks of ISPs and exchange points physically send light down correct cables with switches:

https://en.m.wikipedia.org/wiki/Internet_exchange_point

batmaniam@lemmy.world · 1 year ago

Yes, sorry, I did oversimplify to the local network. On your local network everything is always listening, but absolutely your home router/modem in Kansas does NOT excite some wires in Tokyo unless you tell it to lol.

And it sounds like you know way more about the software than I do, but I can say with confidence that when a router starts putting ossilating high/low on a cable, everything on that cable “sees” it. I’m fairly sure that’s why different address blocks have the limits they do; there’s only so many addresses you can have without needing to ossiclate that voltage stupid fast.

You should look into some of the serial examples for raspberry pis/ arduinos, with your software background you’d probably really enjoy it! It’s funny to run into things like the fact that you can have issues like the wire not going back to low sometimes, and the myriad physical issues.

And seriously check out MODBUS. It’s crazy how “simple” it is. With no handshake and a standardized data format, you can trigger all sorts of stuff. That’s the protocol that controls most people industrial things, including GIANT pumps and valves.

deur@feddit.nl · 1 year ago

Ethernet is the common PHY even across fiber afaik.

agent_flounder@lemmy.world · edit-2 1 year ago

The circuitry doesn’t determine which cable is the correct one. That is determined by a protocol that associates various IP networks with different network interfaces. So, for example, all data going to 192.168.5.0/24 goes to interface eth0, and 192.168.0.0/24 goes to eth1 and 10.0.0.1 goes to eth2 and so on. Each interface is a separate RJ45 Ethernet port on your router, for example. It doesn’t have to be RJ45 it could be your router has a Thick Ethernet or Thin Ethernet connector. Or it could have wifi. Or something else.

Anyway, forwarding the packet to the correct interface / subnet can be done with a static route defined on the router. Another way is dynamic routing using BGP (border gateway protocol) which is an exterior gateway protocol that dynamically routes between your network and somewhere exterior to your network. Yet another protocol is OSPF (open shortest path first) which is used inside a corporate network for dynamic routing.

For any of these the router knows how to send the IP packet to the next hop, another router, which in turn knows how to send it to the next hop.

Where to send is based on the destination IP. The routers know which interfaces and which other routers are responsible for different subnetworks.

It is sort of like how once your mail makes it to a main hub in your state, it is then routed to the main hub for the destination state, and from there to the post office responsible for the destination zip code, and then to the mail route (and hence truck) responsible for the street and number.

So if your destination is 1.1.1.1 maybe there is a router known to be responsible for 1.0.0.0/8 and then it knows what router is responsible for 1.1.0.0/16 and so on until we get to a router that has 1.1.1.1 on one of its subnets then it sends directly to 1.1.1.1.

LainTrain@lemmy.dbzer0.com · 1 year ago

IPs and packets are well and good and I do have a decent working knowledge of TCP/IP, but what physically is actually happening? Thanks for replying anyway!

agent_flounder@lemmy.world · edit-2 1 year ago

Physically, at the physical / link layers, an Ethernet transceiver integrated circuit is used that knows how to take data provided by the cpu and communicate it by sending signals along the RJ45 Ethernet physical layer to communicate with the switch. By looking at the datasheet and IEEE 802 specs one could figure out more detail.

Sethayy@sh.itjust.works · 1 year ago

Look into MII, RMII and RGMII if you want a google starting point

quinkin@lemmy.world · 1 year ago

KillingTimeItself@lemmy.dbzer0.com · 1 year ago

basically, the entire TL;DR of this post, from someone who is a linux nerd, that knows some things about networking.

Everything knows where everything is, and if it doesn’t it knows something else that does, and if that doesnt, well, repeat adnauseam. The technicality here is that not every individual point knows where every other individual point is, but it knows it’s immediate neighbors. And those immediate neighbors do as well, at the high routing level, think data center.

Think of it like a tree structure, but a really fucking big one, and with a lot of circular and unusual connection points. You can get from one point, to any other point. It’s just a matter of knowing how.

Also, to be pedantically accurate here, the internet is a hodge podge of packet flinging hardware, “routes” aren’t really a thing. Packets will take whatever route is determined to be optimal by the hardware it interacts with. I.E. it dynamically changes as needed, that’s why your ping is always variable

OmegaMouse@pawb.social · 1 year ago

Thanks, this is a good summary. It’s useful to know about the dynamically changing route - that explains a lot.

KillingTimeItself@lemmy.dbzer0.com · 1 year ago

nice root federated instance btw

OmegaMouse@pawb.social · 1 year ago

Root federated?

KillingTimeItself@lemmy.dbzer0.com · 1 year ago

we’re on lemmy which is a federated service, essentially the tl;dr is decentralized. Root federation in this context refers to the instance that hosts your account. In my case dbzer0, in your case pawb.

Personally i’ve found it really interesting seeing the sub niche interactions between different federated platforms. It’s a weird look into how humans tend to associate.

OmegaMouse@pawb.social · 1 year ago

Ah gotcha! Yeah it’s pretty neat seeing the ways in which the instances intermingle. Some communities stay pretty niche and used only by local users with the same interests, whereas others are melting pots of every instance. I guess it’s a bit like a society with little towns and bigger cities.

KillingTimeItself@lemmy.dbzer0.com · 1 year ago

yeah, it’s interesting to see in comments and other communities as well.

The vast majority of accounts seem to be on lemmy world though. Which is interesting.

seang96@spgrn.com · 1 year ago

I didn’t see this mentioned yet, but IP ranges are normally assigned by generic location, so each of thes routers routing to the next one (hops) basically have a memory table from prior routes/configured by ISPs to say “this is the best current upstream router to route to for this destination”. They also store the distance between routers and aim for the smallest distance. this is how they are fast and is called routing tables.

Routing tables can be misconfigured causing major outages and old routers used to be able to only store a smaller table so 512k day happened. We already passed the next one 768k though ISPs mostly had their crap together for that one.

ComradeSharkfucker@lemmy.ml · 1 year ago

Electrons are sentient actually