New topics: Your Pet, IOU, Baby IQ, The Poisons, Birther II, Games, Future Power

Diary about producing wiki articles about technical issues during the course of software development

Skip to end of sidebar
Go to start of sidebar
Skip to end of metadata
Go to start of metadata

This covers a time span of about 7 hours, including writing the email about what was being done

Hey Michael....

Didn't want to mix apples and oranges by mentioning some other cool stuff I'm doing at the moment unrelated to the CDS stuff on that thread....

Watching bobsgear reboot and check its drives for consistency... One of the drives says it has "gone 48126 days without being checked, check forced". I love working on 130 year old servers, amazing how Linux can run for more than a century between reboots.....

I am working on a java app to automatically generate ever growing websites from refactored wikipedia content. The goal is to have the tool grow the sites a little bit each day, and to have multiple styles for rendering the wiki content to web pages, and eventually have multiple ways that the robot augments the content from wikipedia.... Eventually I want the app to publish to multiple targets like the different wikis I run, flat files, remote FTP web hosting accounts, etc..... http://www.listofinsects.com/ or http://www.listofprogramminglanguages.com/ or http://www.listofbattles.com or http://www.listofoccupations.com/ or http://www.listofparadoxes.com http://www.listofairports.net I've got 50+ such domains.

http://www.owsd.org has a lot of open source free website templates you can use. As you can see, I have partially ported several of them. There is still some "dolorem ipsum" which I am gradually removing as I add more features to the java app (robot). Since google likes to see sites that are being updated, making tweaks to the template is a great way to cause updates to all the pages of the sites, thus making google believe the sites are being actively developed. Hopefully, the different templates throws google off the scent that the same tool is generating all the sites. Eventually, the tool will have some different styles for rendering the body content of the pages.

I've been telling the tool what to do via an XML file describing what to do for each domain, i.e.

<web><name>List of occupations</name>

<usetemplate>org.owsd.bamboo.2col</usetemplate>

<topkeywords><keywordslist>occupations</keywordslist></topkeywords><domain>listofoccupations.com</domain>

<topics><maintopic>Category:Occupations</maintopic>

<topicslist>

List of military occupations

Lists of people by occupation

List of metalworking occupations

Contingent workforce

Operation Power Pack

Category:Labor

Shoeshiner</topicslist></topics>

<outputconfig><extension>.htm</extension><output1><dir>/home/listof/domains/listofoccupations.com/public_html</dir></output1><output101><dir>c:\data\buildweb\domains\listofoccupations.com</dir></output101></outputconfig><output101><dir>c:\data\domains\listofoccupations.com</dir></output101></web>

After it builds each website, it then keeps track of some information about additional topics that could be added. This intermediate info can be thrown away and rebuilt, it just seems kind of cool to have one config that I edit to tell the tool what to do, and another config that it edits to keep track of what it might do in the future:

<?xml version="1.0" encoding="UTF-8" standalone="no"?><web><name>List of airports</name>
<topicsneeded><topic><oi>0</oi><o>List of airports</o><l>List of the largest airports in the Nordic countries</l></topic>
<topic><oi>0</oi><o>List of airports</o><l>List of cities with more than one airport</l></topic>
<topic><oi>0</oi><o>List of airports</o><l>ms:Senarai lapangan terbang</l></topic>
<topic><oi>0</oi><o>List of airports</o><l>List of airports in the United States</l></topic>
<topic><oi>0</oi><o>List of airports</o><l>no:Lufthavnliste</l></topic>

<topic><oi>0</oi><o>List of airports</o><l>Lists of military bases</l></topic>
<topic><oi>0</oi><o>List of airports</o><l>airport</l><t>airports</t></topic>
<topic><oi>0</oi><o>List of airports</o><l>List of Airport Museums</l></topic>
<topic><oi>0</oi><o>List of airports</o><l>es:Lista de aeropuertos del mundo</l></topic>
<topic><oi>0</oi><o>List of airports</o><l>Category:Lists of airports</l><t> </t></topic>
<topic><oi>0</oi><o>List of airports</o><l>List of IATA-indexed train stations</l></topic>
<topic><oi>0</oi><o>List of airports</o><l>World's busiest airport</l><t>busiest</t></topic>
<topic><oi>0</oi><o>List of airports</o><l>:Category:Lists of airports</l></topic>
<topic><oi>0</oi><o>List of airports</o><l>World's busiest airports by traffic movements</l><t>by air traffic</t></topic>
<topic><oi>0</oi><o>List of airports</o><l>:Category:Airports by city</l></topic>
<topic><oi>0</oi><o>List of airports</o><l>Airline destinations</l></topic>
<topic><oi>0</oi><o>List of airports</o><l>List of eponyms of airports</l></topic>
</topicsneeded></web>

It's been fun learning how to parse and produce these XML files from java. It's not nearly as clean or easy as in C#.

Looking at the above, each topic is something not yet added to the website, and what page was wanting to link to the topic. I groan at the repetition of the <oi> page index, and the <o> title of the page that goes with it. That should be in its own section.... It's rapidly looking like i need to have a SQL database for the intermediate results.

So I was looking up alternatives for SQL from java. I could just use one of the copies of SQL server that I run, but then I couldn't easily make this java program available as a standalone program without some complicated setup instructions, and requiring someone to have their own copy of SQL server. So I created a wiki page with a couple of standalone alternatives for doing SQL from java without needing any SQL server. hsqldb is one I've seen mentioned a lot, so I downloaded it....

http://www.bobsgear.com/display/ts/Java+Database+Choices

So then i was looking at some examples on the net of how to use it. It is tedious to use JDBC to talk to SQL databases from java. Linq for C# is way easier once you get over the learning curve.

I found a wrapper class that looked cool..... Then I was needing to make some updates to the code (since it was from 2003), and then doing some "wouldn't it be nice if" thinking, started running into problems, figured them out, then created a wiki page about that:

http://www.bobsgear.com/display/ts/Dealing+with+unimplemented+exceptions

Then i was arguing with the networking on the server when I couldn't get to my subversion repository. I recognized it as a dropped gateway problem. I had previously written a wiki article, back in 2007, about problems with dropped gateways, so I reread my article to refresh my memory about what I'd learned, and read some of the things I'd found in my previous research.

http://www.bobsgear.com/display/ts/Multiple+Gateways+on+Windows+2003+Server

It's interesting to go to the Tools -> Info and look at the list of google queries people were using to find this page.

From reading that, and the linked pages, I thought I figured out a fix to the problem, tried it, and it seemed to work, so I added a note:

http://www.bobsgear.com/display/ts/Fixing+a+dropped+gateway+on+Windows+Server+2003+with+new+route+entry

Then when it didn't work, I added an update to that page....

Then I realized a lot of this email would make a nice wiki article, so I then I created:

http://www.bobsgear.com/display/bobsgear/A+morning+in+the+life+of+a+wiki+user

The parent page to that has links to other ideas about how to use wikis.

A healthy dose of paranoia: I didn't mention too much about the java tool on the wiki page above because I am paranoid about google. the point of haivng a robot develop content is to increase the amount of stuff I have online that can attract web traffic... Imagine if all 50 of those domains each generated a dime or even a quarter every day. But google hates automatically generated content. So I am paranoid about talking publicly about automated methods to create content. They dont mind serving ads on things driven by a database, like a message forum, etc., but they dont like content that isn't original or useful to users. I think my robots, when they produce mashups, are producing smoething that is unique, and useful to humans. So google and I agree to disagree and the battle goes on.... However, I do document these kinds of projects in a private space in the wiki.

Ditto my networking configurations for my computers. It's easy enough for someone to ping www.bobsgear.com, and then start nosing around to figure out the OS, etc. but I dont want to give any helpful information by publishing actual configs online in that gateway article. That might be too paranoid, I don't know. But in a private space, I have a lot more information about my servers, with histories of the configurations of the machines, my networks, list of VMs, lists of what I've installed, changes, how I have things mapped etc.

Periodically, I take that information and export to a PDF file so I can have the information available in the event I need it to fix the system if it is down. Of course, periodically making offline backups is important too.

Hope you enjoyed all of this....

  • Garnet
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.