For a further bit of nostalgia, the page itself is archived from another site by using the program Teleport Pro, which is also from about the same time. This can be discovered by trying to follow the bottom link to Webalizer's site.
(Coincidentally, looking for a modern alternative to Teleport, particularly on Mac, is a surprisingly fruitless endeavour—at least if one has grown to limit themselves to open-source software. Meanwhile I expect at least two of the sites I sometimes use to go belly-up soonish.)
As for Webalizer itself: one of my first jobs was to make a log analyzer in Perl, and since I knew almost nothing about performant file access, string splitting and data crunching, the resulting code ran fairly slowly. So iirc I soon found Webalizer as the replacement, since it was written in C and by more competent people. Or perhaps I chose AWStats, can't remember.
(Come to think of it, the Perl thing might run okayish by now, since it was a stock Perl loop over the lines with a regular expression—bad but not fatal. Now I'll have to find the cd with the encrypted archives and figure out what was wrong with the code, dammit.)
bcx 541 days ago [-]
Thanks for the nostalgia :)
I cut my teeth in a very similar way and ended up with the same conclusion webalizer was far more maintainable than AWSTATS. Though the UI on AWstats was definitely pretty nice. At some point I actually had a perl CGI script that I was selling that did basic website analysis (can't find any references to it on the internet quickly).
I still use webalizer on a small site I self-host for my dad.
Most of these hostnames wouldn't be found in the logs nowadays. AOL being the top, followed by UUNet, Mindspring and Earthlink.
I've never heard of those other providers. And I wonder who used printserver.swissplace.ch to make 258 visits in 1999?
adsl-216-103-75-178.dsl.snfc21.pacbell.net - Some lucky sod had ADSL in 1999!
heavensgate.futurebyte.net - Presented without comment.
zimpenfish 540 days ago [-]
> adsl-216-103-75-178.dsl.snfc21.pacbell.net - Some lucky sod had ADSL in 1999!
IIRC I knew a few UK people who had ADSL in 1999 but they were part of the BT trials, not a general release of ADSL (because the UK's broadband is an omnishambles of a monopoly.)
Webalizer is still shipped by default with cPanel, which although maybe not popular here is still by far the most commonly used hosting control panel. There are surprisingly few tools for quick log file analysis, goaccess already mentioned being a notable exception.
Tijdreiziger 541 days ago [-]
At least one DirectAdmin-based host I know of has it too, I wouldn't be surprised if it's ubiquitous.
You can't run GoAccess on these kinds of systems (as a customer) because you usually don't have SSH access. Even if you do, many customers on these kinds of hosts might not know how SSH works.
For what it does, Webalizer is still pretty nice, but unfortunately most of what you'll see on any serious website will be bot traffic.
I'm still using webalizer, it's an excellent and high performing tool that generates a nice website.
wojtczyk 541 days ago [-]
Same here. Still love it.
esher 542 days ago [-]
As far as I understand The Webalizer is only using server side logs to get all this information. No JS and no GDPR required. Impressive to me.
aasasd 541 days ago [-]
There's also likewise ancient AWStats, made in Perl. IIRC I used it for some reason instead of Webalizer, but that might be simply due to better support of Perl on shared hosting of that time.
Though by now, after having experience with a performant setups, I would probably choose to chuck the logs directly into something like logstash+elasticsearch, or something more lightweight. Because I came to believe in processing data on arrival, not on request. Though perhaps Webalizer or AWStats themselves can process small batches, idk.
mhitza 541 days ago [-]
You still need a privacy policy that states that your server collects IP addresses (PII/personal data), but there's no need for a user to consent to it IMO because it's intrinsic information when accessing the website. Although you should state how long the IP addresses are stored within the logs/backups.
capableweb 541 days ago [-]
Indeed. Easiest solution I found to this "problem" is to mask the last parts of the IP, don't store any of it.
seqastian 541 days ago [-]
> no GDPR required
Who ever told you that is wrong. Doesn't matter if the client or the server collects data.
tommica 541 days ago [-]
Can't see any PII info in the demo, so the claim should be correct
seqastian 541 days ago [-]
In the "Top 30 of 10484 Total Sites" section there are hosts and IPs.
If they remove the end of IPs and first part of hosts its considered 'anonymised' by matomo or goaccess. But you still are collecting IPs in the server logs and need a gdpr statement documenting it.
dspillett 541 days ago [-]
Has it not been successfully argued that an IP address is not sufficiently identifying (in various court cases wrt piracy)? If so then the source host's name, either given directly in the protocol or returned from an rDNS query, shouldn't be either.
warpspin 541 days ago [-]
> Has it not been successfully argued that an IP address is not sufficiently identifying (in various court cases wrt piracy)? If so then the source host's name, either given directly in the protocol or returned from an rDNS query, shouldn't be either.
No. For purposes of the GDPR, IP addresses are considered personally identifying information.
A bit of a background: It had long been accepted in EU law, that a statically assigned ip address is PII. For years, it has been contested if dynamically assigned ip addresses also fall under this, as the owner of a website has no means to actually trace that ip address back to a natural person. Here the highest EU court basically decided, that as long as even a third party (the internet provider assigning the dynamic ip address) is able to identify the person using an ip address at a certain time, also dynamically assigned ip addresses have to be considered PII, and therefore all ip addresses.
mypetocean 541 days ago [-]
Another thing to consider is that in many PII laws, data which can be combined with other data to reconstruct a complete instance of PII will itself count as PII.
So if a full hostname or IP counts as PII, then a _partial_ hostname or IP also counts.
aasasd 541 days ago [-]
This logic seems to suggest that ‘a’ is my PII, because you can get ‘aasasd’ from it if you throw the dice enough times.
Karunamon 541 days ago [-]
If accurate, the fact that you need a GDPR statement to cover the default logging configuration of every HTTP server created in the last 30 years strikes me as ridiculous. Wouldn't your own analytics/abuse prevention/optimization count as legitimate interest and not need a statement or consent?
I'm serious, the practical effect of what you described would be the internet equivalent of a California prop 65 warning that is on basically everything. I.e. meaningless. Everything in California causes cancer, everything on the internet has your IP address. And the "informed" person is not one bit better off as a result.
dariosalvi78 541 days ago [-]
collecting IPs can be perfectly lawful without consent [0], if you justify it for security or performances, but you need to make sure to get rid of logs when they don't fulfil those purposes any longer.
This tool did me a good service for years without needing any JavaScript. I finally decided to get rid of all access logging on my personal websites when GDPR came around, so now it's also gone.
SadWebDeveloper 541 days ago [-]
Why? GDPR only affects if you use the data for 3rd party market research, storing the data for your own debugging purposes is not covered in.
pvorb 531 days ago [-]
I didn't want to dig into details like these. Turning it off was the lowest possible effort for me.
(Coincidentally, looking for a modern alternative to Teleport, particularly on Mac, is a surprisingly fruitless endeavour—at least if one has grown to limit themselves to open-source software. Meanwhile I expect at least two of the sites I sometimes use to go belly-up soonish.)
As for Webalizer itself: one of my first jobs was to make a log analyzer in Perl, and since I knew almost nothing about performant file access, string splitting and data crunching, the resulting code ran fairly slowly. So iirc I soon found Webalizer as the replacement, since it was written in C and by more competent people. Or perhaps I chose AWStats, can't remember.
(Come to think of it, the Perl thing might run okayish by now, since it was a stock Perl loop over the lines with a regular expression—bad but not fatal. Now I'll have to find the cd with the encrypted archives and figure out what was wrong with the code, dammit.)
I cut my teeth in a very similar way and ended up with the same conclusion webalizer was far more maintainable than AWSTATS. Though the UI on AWstats was definitely pretty nice. At some point I actually had a perl CGI script that I was selling that did basic website analysis (can't find any references to it on the internet quickly).
I still use webalizer on a small site I self-host for my dad.
https://webalizer.net/sample/site_199905.html
Most of these hostnames wouldn't be found in the logs nowadays. AOL being the top, followed by UUNet, Mindspring and Earthlink.
I've never heard of those other providers. And I wonder who used printserver.swissplace.ch to make 258 visits in 1999?
adsl-216-103-75-178.dsl.snfc21.pacbell.net - Some lucky sod had ADSL in 1999!
heavensgate.futurebyte.net - Presented without comment.
IIRC I knew a few UK people who had ADSL in 1999 but they were part of the BT trials, not a general release of ADSL (because the UK's broadband is an omnishambles of a monopoly.)
https://webalizer.net/sample/site_199905.html
You can't run GoAccess on these kinds of systems (as a customer) because you usually don't have SSH access. Even if you do, many customers on these kinds of hosts might not know how SSH works.
For what it does, Webalizer is still pretty nice, but unfortunately most of what you'll see on any serious website will be bot traffic.
Though by now, after having experience with a performant setups, I would probably choose to chuck the logs directly into something like logstash+elasticsearch, or something more lightweight. Because I came to believe in processing data on arrival, not on request. Though perhaps Webalizer or AWStats themselves can process small batches, idk.
Who ever told you that is wrong. Doesn't matter if the client or the server collects data.
If they remove the end of IPs and first part of hosts its considered 'anonymised' by matomo or goaccess. But you still are collecting IPs in the server logs and need a gdpr statement documenting it.
No. For purposes of the GDPR, IP addresses are considered personally identifying information.
https://curia.europa.eu/juris/document/document.jsf?text=&do...
A bit of a background: It had long been accepted in EU law, that a statically assigned ip address is PII. For years, it has been contested if dynamically assigned ip addresses also fall under this, as the owner of a website has no means to actually trace that ip address back to a natural person. Here the highest EU court basically decided, that as long as even a third party (the internet provider assigning the dynamic ip address) is able to identify the person using an ip address at a certain time, also dynamically assigned ip addresses have to be considered PII, and therefore all ip addresses.
So if a full hostname or IP counts as PII, then a _partial_ hostname or IP also counts.
I'm serious, the practical effect of what you described would be the internet equivalent of a California prop 65 warning that is on basically everything. I.e. meaningless. Everything in California causes cancer, everything on the internet has your IP address. And the "informed" person is not one bit better off as a result.
[0] https://law.stackexchange.com/questions/28603/how-to-satisfy...