This is a question that’s been rattling around in my head for a while now. Just curiousity to understand how ISP connections work.
There was a case awhile ago in Ottawa where certain Wikipedia articles about politicians had been found to have been consistently edited in favour of the politicians in question.
Wiki investigators and media folks were able to find that the changes in question came from ISP links in the federal government. The assumption was that staffers for the politicians in question were the ones doing the favourable edits, but I don’t think they could nail it down to a specific computer.
How come? If they have the ISP link, can’t they trace it the next step and find out what computer was linked to the ISP to make the edit?
Or is the computer that links to the ISP just not recorded?
Most businesses use private computer IP addresses on their business networks. Then, to allow their machines to access the Internet, they map their private computers addresses to one or more public/ISP addresses. A router on the border connecting the private network to the public network/ISP link handles the mappings dynamically. If a business is constantly logging the connection mappings, then tracing back to the originating computer is possible. But this is a lot of data to log, and many places do not bother.
Basically internet traffic moves across many different layers and those layers only know the information that they need to for their job.
For example, the aide’s PC would connect to a Government of Canada switch that then connects to a GoC router, their ISP’s router, a bunch of routers to reach Wikipedia’s ISP, then Wikipedia’s router and finally whatever switch the server is on.
The IP address allows the router to talk to each other, but you’d need the information on the local router to know switch it goes to and then from that switch to know what PC. That isn’t public and why the ISPs are bombarded with requests to disclose their logs so people can start lawsuits.
For consumers, a switch and router are often integrated into the same device. For a large organization, it’s separate hardware.
Yes. Certain IP address blocks are owned by certain organisations. For example 16.0.0.0/8 is owned by HP and 17.0.0.0/8 is owned by Apple. Other organisations have smaller blocks. Individual IP addresses may be static - permanently assigned - or dynamic - temporarily assigned. The latter is more usual in office situations as it is more easily managed. Unless the assignments are logged then you’re not going to be able to map the IP address to the device. To make matters worse, organisations can have a public IP address, dynamically assigned by their ISP, behind which are a multitude of private IP addresses, all dynamically assigned. (Think of a housing estate with one entrance / exit road.) That’s double the trouble!
In the old days, the MAC was hard-coded into a ROM on the network adapter. These days the MAC is stored in flash memory, and on many network adapters it can be changed as easily as just going into the network adapter’s settings and typing in a new number.
It’s permanent in the sense that it won’t change on its own, but it’s easier to change than it was in the old days.
ISP’s (particularly phone companies), may have only a few IP addresses, and then map those to individual phones. And they keep records, because they are required to do so. (depending on where you are, and what the rules are where you are).
Governments and large corporations do the same kind of mapping, but don’t keep the same kind of records, because they aren’t required to do so.
There used to be more ways to easily track and keep this kind of information. And on small networks it made sense to do so when storage and network connections were expensive.
Now tracking and keeping this kind of information is expensive and difficult, and runs into privacy laws, and only makes sense for organisations that are (1) large enough to handle the difficulty, and (2) have a particular reason for wanting to do so.
I thought the dynamic IP was sent outside the local network as part of the packet address. Something must go outside-if two computers on the local network are talking to wikipedia at the same time wikipedia needs to be able to differentiate between them. Otherwise the replies would be scrambled. I thought the dynamic local IP address was part of the packet, but it could be some number assigned by the router. So only the owner of the router would be able to figure out which computer talked to wikipedia that day.
Now of course part of the WWW protocol is to provide additional information, for instance the previous page accessed, and information about the computer and browser being used. Of course that may not be very identifiable, lots of people used PCs and Firefox, but it could be a clue. And additional information may be asked for by the web server. I don’t know.
I think so. As I recall, sometimes, if I use my tablet, I get a message from an email provider that there was a login from a different device, even though it was through my home wi-fi router and would have the same IP number. But maybe it is only detecting Android instead of the usual Windows.
Wikipedia receives the IP of the router, along with its MAC, and messages going between your PC and Wikipedia have a connection ID associated with them. The router knows how to translate the connection IDs back and forth so that the message gets back to your PC and doesn’t go to a different machine on the same network.
Wikipedia does not get the local IP of your computer, only the IP of the gateway device (typically a router) that makes the final connection out onto the internet.
If you want the gory details, these will get you started:
What a website (like Wikipedia) receives from your browser:
What type of browser (IE, Chrome, Firefox, etc).
Information about your system, like OS, CPU type, display resolution, and possibly other info like how much battery charge you have left.
What plugins are running on your browser.
Your connection speed and your IP, which can be used to locate you geographically (sometimes the IP location isn’t terribly accurate, since it may be going through a router that is many miles away from where you actually are).
What social media you are currently logged into, like Facebook and Twitter.
Any cookies that are set by your browser, which track all kinds of stuff like what sites you have accessed and what ads you have viewed or clicked on. (they call them “cookies” because the developers who created them were smart enough to realize that most people won’t object to something called a “cookie” but would object to the more appropriately descriptive name “evil spy-like tracking tags that record your every movement on the web”).
What website referred you. In other words, if you saw a wikipedia link on the SDMB and clicked on it, Wikipedia would know that it was a link on the SDMB that you clicked on to get to Wikipedia.
Some of this can be blocked using ad and script blockers. Cookies can be defeated using private browsing options available in most browsers. Some of this you don’t want to block. If you block things like your browser type, operating system, and display resolution, the web page might not display properly.
I don’t know what info Wikipedia tracks. I know that they track IP, because they store IPs along with edits.
Google tracks every little scrap of information it can get about you. This includes data from ads on other web sites served up by google ads. They spy on you harder than the KGB does.
As engineer_comp_geek says, there is a lot of information a website can extract from your machine if it chooses to. Even without some of the extreme information (cookies, etc.) the information provides a “signature” — not as unique as a fingerprint perhaps, but pretty useful.
How many websites actually gather and save such “signatures”? I doubt if Wikipedia does, but I’m just guessing.
But people should be outraged about how many sites gift their information to other sources. To give an example on how invasive this data can be here is a portion of the java that a company called Adthink was serving up a while ago.
Cookies, Tags, Pixels and IDs make it possible to track down individuals and the dope does use google analytics.
The _ga cookie for GA stores your Client ID, for up to two years of inactivity. Wikipedia doesn’t use these tools, but it is just an example on how public your information is and how easy it is to assign it to an individual.
Given enough probable cause to do enough legal discovery they could find out the information by inspecting everyone browsers for the session information.
The cost/effort is fairly high and there are legal barriers to overcome on the collection. But never assume anything you do on the internet is private or untraceable.
Many sites have dozens or more trackers and some like the empty gif trackers and browser fingerprinting can easily track you even if you don’t log in.
“Browser Fingerprinting” can work despite any client attempts.
Once again Wikipedia would have to have this data, and would need to be willing to share it. Due to their model I doubt they do this type of analytics, but machines and users do expose unique fingerprints in multiple ways.
To wrap up a few thing about IP numbers.
Back in the day, each computer had its own IP address, and directly appeared on the Internet. If my computer accessed your computer you could log that access and see it by IP address, and in principle sheet it home to the exact machine. This has long since ceased to be how things work.
We ran out of enough IP4 addresses to match the number of machines that need network connections decades ago. But it didn’t matter. Internally just about every computer sees the Internet via some sort of address mapping that swizzles around the IP addresses so that what you see on the naked Internet is not the IP address that the machine thinks it has. The most ubiquitous mechanism is NAT. Network Address Translation. At home your router connects your computers to the Internet via a NAT bridge. Your whole household only has one real IP address. The NAT function lets every computer in your household have an IP address (one that is only unique within your household, and chosen from a set of addresses that are defined to be never routed.) Your ISP allocated your connection the IP address (usually dynamically - although you might be able to pay more to have a static one.)
ISPs keep track of the mapping of the IP address they allocate to your router, at least for a while, and this allows for investigations (with suitable warrants) to trace back to the household that had the IP address of interest at the time.
MAC addresses are not used in routing, only point to point. They never make it past the first router, and so are never visible externally. You can see the MAC addresses of everything on your local are network, but never past that.
Large organisations will always want to firewall and quarantine communications at their boundaries. Thus they don’t let naked access in or out, and will often need to map more machines wanting access to the Internet to the range of addresses they have available, and so will provide NAT or NAT like function at their boundary. You might see IP addresses that can be mapped back to individual computers, especially for organisation that were early users of the Internet. (Especially those very early companies that have class A addresses, and have 2[sup]24[/sup] addresses.) But most don’t.
Companies that provide large scale internet services (eg Amazon FB, Google) have very sophisticated network management front ends that connect buildings filled with servers to the Internet, yet appearing as one solitary IP address. These systems perform dynamic load balancing and a whole lot more. The above companies are rolling their own gear now, but the next tier down of web services companies continue to make the likes of Cisco and Juniper a lot of money. This sort of network gear comes at serious money.
All of this is about locating the precise physical computer someone used to connect from. Useful if you are looking to find a particular person. The stuff described in above post about cookies, fingerprints and the like are all about being able to tell if the same computer comes back again. Which is a different, but sometimes even more valuable thing. You might connect from a whole range of WiFi hotspots about town with your laptop. But web sites still see the same you, and know it is you returning.
Your statements are true and accurate in the case of the classic IP Version 4 (ipv4). IP Version 6, which is becoming the default for many newer computers and more broadly supported by network providers, changes a lot of this guidance.
ipv6 assigns a unique and world-visible IP address to each host on the network. Network Address Translation doesn’t usually happen. The ipv6 address (knows as the “Global Address” or “Global Scope”) of the computer is visible and definitive behind almost every standard router and firewall configuration. Also, the default method of ipv6 address assignment embeds the MAC address of the computer in that world-visible address, so the computer can be identified with great specificity.
Wikipedia will not see the MAC of your home router. The source (and destination) MAC on the packet is updated as it transits each layer 3 hop (each router) on its path between your router and Wikipedia’s server.
The crazy thing is, that 20-30 years ago, when IPV6 was being developed, MACs were being hidden (removed from API’s and shared information), and Intel was being pilloried for providing identifiable ID’s. Because privacy, and government.
I don’t know if it just means that people have come to their senses about traceable ID’s, or if it means that we’ve all come to love Big Brother (and it doesn’t matter if Winston is shot now). I’m old enough to find it odd.