How the WWW.*.* list was created:

Turns out I made a bad assumption in the first program that cut out a lot of the domain names, I am currently redoing all of the pages, I have yet to do the EDU and COM page, the net page took less than a day, I expect the EDU page to be the same, I started the COM page a couple of times and expect it to run for a very long time (maybe a week)...

Why I started this project is not relevant. What I wanted was a list of valid addresses that fit the form HTTP://WWW.*.*. And I needed to learn Windows Sockets programming, this turned out to be a good exercise. The first cut at it started at the same place and ended a little different and took two orders of magnitude longer. I started with the INTERNIC zone files (master lists of all registered domains). Stripped of the last two words off of each entry (*.com for example) added WWW. to the front. Next step is to do dns searches on each address, if an address returns successfully then open a TCP socket (port 80), on the first cut, I would request the default page at that address and parse the title string. On this latest run I did not requst the page and parse the title string. The time consuming part is the DNS address lookup and the TCP connection (also the page request was time consuming before). Bottom line the first run took literally four months nearly 24 hours a day over a 33.6 connection. This run took four days. I started with RFC 1035 and found that DNS requests can and usually do use UDP, which means there is no time consuming connection time, and if there is no response, who cares, for this case I am only interested in the ones that respond. This run starts again with the INTERNIC zone files. The first program ZONE.EXE is used to extract unique domain names from the original zone file. Next DNS.EXE takes that list and sends four requests a second (keeps traffic managable). And processes the responses. It is interesting to note that I compared the results here with results given by gethostbyname(), turns out gethostbyname() gives bad addresses often (my TCP/IP stack was the native Windows95 PPP Dial-up Adapter) which would slow down the following procedures. Next I sorted the list alphabetically with SORT.EXE. Yes it is a bubble sort, and yes it was slow, I only really needed it for WWW.*.COM, the rest I did with my editor. The next step is the slowest one, I used SEARCH.EXE to check TCP connections to all of the found addresses. And lastly I used SPLIT.EXE to turn the list into nicely formatted HTML files. ZONE.EXE, SORT.EXE, and SPLIT.EXE are DOS programs. DNS.EXE and SEARCH.EXE are Windows (16) programs. I leave it up to the reader to view the source code and figure out what to pass each program on the command line.

Source for above programs