Skip navigation

Category Archives: Status Update

I have been working on code clean up and spending the majority of my time on the thread safe heart beat which is proving to be a little more difficult than I expected.

Right now the plan is to poll the jobs table every x seconds (prolly 300 seconds) and if there are jobs found to do a distinct on the table to see if there are multiple groups.  If there are multiple groups then spawn off parallel processing of each group.  Given the amount of actions for each group I expect that most polls will return more than one group to process.

A long side the heartbeat thread is going to be a worker thread that goes and checks things like back log, group headers, releases to process, tables to clean up etc.

Also a short break will be taken on March 5th for the release of a game, so work will likely slow a bit.

Well no documentation got done however I did the initial testing with the release stats.  I ran a test with 300 releases getting the stats.

The stats that I was gathering were, size, parts, and total parts.  Using the DB for everything ended up costing me around 1 hour and 50ish minutes.  I did the tests then with just gathering the initial objects from the DB then paring down in memory.  I was able to get the 300 releases done in a little under 6 minutes.

Had some problems with the size calculation due to SOME releases being posted larger than 120gb, but I got that all fixed up.

So, looking at the immediate list of things to do, 3 is now pretty much all complete, 2 is pretty much all done but ill be moving onto another phase for that.  7 has been started but thats gonna be a work on progress.

I am considering  a few different paths right now, toying with the thought of trying to setup a parallel processing of releases as I feel that we the 6 minutes could be cut down to around 3 or 4 minutes.

I have been thinking about the master thread, basically its just going to query a bunch of the tables to see what the current status is.

Basic job concept is now working.   Jobs are pulled from the DB and processed.  Need to correct a problem where the tables are not checked before dropping but the exception is just passed over rather than halting operations.  I am going to throw a few groups into table and pull a few million headers down.

There are a few ERRs popping up in the tabs, so I will need to throw that in the list of things TODO.

Tomorrow I will spend some time putting together the documentation about the semi final structure of the database and working on a heartbeat worker.

6 new classes later, the main program is now only a very small fragment of what it used to be.  This is now in prime position to create a nice multi threaded application.

I will now move onto getting the db structure in shape and start doing some tests with multiple groups.

I also have a thought to parameterize the db connection details, the base file location and a toggle for windows and *nix systems.  Parameterize in terms of C#, so creating a .properties file with the details.

Today I am working on splitting up the main functions in preparation for the threading.

This is proving more difficult than previously expected.

Good progress today, did my first trial run through of the DB integrated application with good results.  3 minutes and 52 seconds to fetch, store, process and create 135 NZBs.  Right now the group is static however given the progress so far the next step will be to grab a list of groups from the database.

From then, there are different paths that consider.  The forefront is;

1. Setting up the backlog vs current processestems

2. Setting up threaded processing of each of the different phases (right now im thinking [NNTP, Disk2DB processing, DB2REL processing, REL2File processing])

3. DB structure for release (Date, Full subject, Size)

4. Basic front end

5. Search systems (maybe sphynx)

6. Master thread

7. Set up basic working table, status codes etc

8. Modify groups table to reflect backlog and current headers.

So there are quite a few different tasks to tackle, so I need prioritize the work.  I think that it will be 3, 8, 7, 2, 6, 4, then the rest.

I did a performance profile after reading a good article (ill link it later) that talked about the harzards of trying to keep analyzing performance as you’re writing your application.  I need to be careful of this so I don’t keep going back and looking at performance.  The biggest hit on performance right now is oddly doing the date conversion between datetime to double.  This is odd and I will need to figure out exactly what is this is doing in part of this conversion to see if there is anything I can do to take care of that performance hit.

Program did programish things!!!!

Clean up is going well, added a few more classes to group like operations.  For example, there is now a DB class for the interactions around the structure of the DB.  Ran into a bit of a problem with using namespace global variables that I ended up overcoming by creating a couple of new objects, one of the objects now contains the NNTP details which were global variables and now get passed around to different methods that require them.

During the DB clean up I stumbled upon a neat little feature of LOAD INFILE which was if the data in a file is tab separated that it automatically separates the data into columns which saved me some cojiggery.  I have also found out that parameterization of the ODA class statements for MySQLConnection doesn’t work for anything EXCEPT for parameters used within ()’s.  This was a bit of a set back as I had hoped to stop doing string manipulation for base group names but unfortunately I will have to keep doing that at least for the time being.

The final item that I am looking at is the difference between doing comparisons in MySQL vs List<string> comparisons.  Without any cache of the operation it takes around 40 seconds to do a distinct comparison and insert for 300k records, I can only guess that this is going to be a VERY costly way to generating a release list so I will be doing two List<strings> and comparing them to see how much better than performs.  My thoughts are not so much.

A forgotten accomplishment was that I started the initial DB structure creation methods in the application instead of notepad.

 

Decided to change things up a little and do more work on the filesystem over the database as MySQL seems quite slow for doing single inserts type of operations.  I have parametertized the transactions to try and speed up some insert time however that doesn’t seem to matter too much.

The next test would be to create a stored procedure for these inserts and call that instead of using inserts.

For the remainder of the week I need to do a code cleanup as it looks like shit at the moment from all the development.  I need to spare out some to NNTPzzz for some unused (but maybe used later) and I will be creating a new class in the name space for the DBINFILE operations.

I did test out creating a large table which contains a “complete” view of each header and that worked pretty well but I need to figure out how to tokenize an INFILE transactions so that that I can move from flatfile to specific columns.

Database work has been mildly successful.  Used insert but that ended up being very slow, instead I ended up using load data infile and it is A LOT quicker.  Using inserts it was running around 40mins to insert around 300k records into the db.  When I ended up writing the results from the news server out to disk and then using load data infile I loaded the same 300k records in about 3 minutes.

Now I am trying to figure out how I am going to order the rest of the tables and what information they should contain.  I think that the raw tables may not actually be needed in the end.

Right now I am thinking;

Release filename table. (This will have parts etc, will need to check for existance

Release table (Releases, size and date).

This is all TBC….