Automatically archive files in torrents with large number of files.

Currently if a user makes a torrent that contains an extremely large number of files, the torrent file will have to index all the file names creating a huge file in some cases, sometimes to big for some clients to even run, and will also create excessive overhead.

Example: If a user wants to share a collection of photos that contains 100,000 files, it would be a huge torrent file with a lot of overhead, but if they were packaged in one or more .zip archives, the amount of work (overhead) the torrent has to do would be greatly reduced.

If BitComet could detect a user attempting to make a torrent with over 1000 files and prompt for permission to archive the files, it would help everyone involved.

perhaps as a test we could make a torrent with 10,000 tiny txt files, then a torrent with the same content in a .zip and compare the difference in size.

Also, can anyone think of an easy way to measure the overhead a dht peer would have running a torrent like this? It would be nice to compare that too.

I made a folder containing 11,337 files.

Each file is a .txt file containing only the word “test”, and each has a random file name.

I made the torrent of this folder with and without padding files, then make a torrent with this folder in a .zip archive (no compression used).

torrent size

Torrent with padding files: 4690kB

Torrent without padding files: 2011kB

Torrent in .zip archive: 2kB

As I expected, it made a huge difference in the size of the torrent, but what I’d really like to know is the impact of the overhead on the swarm.

There was also a wild difference in task size too. Obviously the task size was much larger with padding files, but I never expected so much difference.

w/pf 354.25MB

wo/pf 44.28KB

zip 1.85MB

and in case your wondering, I didn’t mix up the Ks and Ms, the padding files actually made it over 350MB. It didn’t create any padding files on my pc though, so I’m guessing they don’t have to be downloaded, only the 44kB gets downloaded and they are made locally. This also raises the question in my mind, is it making a padding file for every file in the torrent so you cannot have more then one file in a bt piece? or does it pack as many files in the piece and top it off with a padding file. I would assume the latter, but in that case, there should only be a few padding files, but considering the size of this torrent, it must be one per file. I’m going to start it in another client and see what the file list looks like.

Interesting experiment so far

edit: I’ve confirmed that it makes a padding file for every single file in the torrent regardless of size, inflating that file to the next highest piece count.

The gui does give a “popup tip” saying enable this option on small file size with use more space.

What was the piece size?

Did you go with 32KB?

What was the size of each file?

It’s interesting to see how this all fits together.

Actually non-BitComet clients HAVE to download the padding files. BitComet is smart about this and disregards them when it meets them.

But since the other clients didn’t implement padding files they will treat them just like any other file, hence download them.

That’s what I was saying in the other topic where we were discussing them.

I didn’t even think about the increase in size for the task, but count now the fact that you would have to manually remove each file if you want to clean up your download.

If the torrent you downloaded contains a complex directory tree with many levels of subdirectories, you would be in for a lot of fun.

The files were all 1k each (actually less but windows doesn’t give an exact Byte count).

The piece size was set to automatic, but it used 32k

If the torrent you downloaded contains a complex directory tree with many levels of subdirectories, you would be in for a lot of fun.

I have to agree. Some software contains an insane number of levels filled with tiny files, so this would be a nightmare. If this could be automatically archived, it would prevent the need for a lot of padding files (if align piece boundary is enabled), and without being enabled it would still reduce the size of the torrent file greatly, as well as the overhead.

On a recent test, making a torrent with over 11,000 files, the torrent created was 2011kB in size.

The same 11,000 files in a .zip archive created a torrent file only 2kB in size.