largefile problems - inform(ation)al pieces

|| - index - problems - systems - libraries - converting - programming - old library - new library - [ links ] -

|| - [ some quotes ] - site ~ - http://ftp.sas.com/standards/large.file - http://ac-archive.sf.net - mailto:guidod@gmx.de - inform(ation)al pieces

inform(ation)al pieces

slashdot: about largefile : "The "l" in lseek()"

Once upon a time (prior to 1978) there was no lseek() call in Unix. The value for the offset was 16 bits . Larger seeks were handled by using the different value for "whence" (the third argument to seek()) which causes seeks to occur in 512-byte increments. This resulted in a maximum seek of 16,777,216 bytes, with an arbitrary seek() often requiring two calls, one to get to the right 512-byte block and a second to get to the right byte within the block. (Thank goodness they haven't done any such silliness to break the 2GB barrier.)

When Research Edition 7 Unix came out, it introduced lseek() with a 32-bit offset. 2,147,483,648 bytes should be enough for anyone, hmmm? :-).

guidod says: note fseeko overruling fseek now

use KiB,MiB,GiB,TiB for 1024 numbers, do not use KB,MB,GB

Seen the hint as a side-note on AJ's LFS status page - there has been a long tradition that HardDisk makers tend to use decadic size measures flagging sizes in powers of 1000. Programmers however speak in binary size measures based on powers of 1024. In 1998 the International Electronical Commission (IEC) has standardized unambigous prefixes for binary multiples:

1024-multiply speak 1000-multiply origin

1024 Bytes = 1 KiB kibibyte 1000 Bytes = 1 KB kilobyte

1024 KiB = 1 MiB mebibyte 1000 KB = 1 MB megabyte

1024 MiB = 1 GiB gibibyte 1000 MB = 1 GB gigabyte

1024 GiB = 1 TiB tebibyte 1000 GB = 1 TB terabyte

1024 TiB = 1 PiB pebibyte 1000 TB = 1 PB petabyte

1024 EiB = 1 TiB exbibyte 1000 EB = 1 TB exabyte

Note how each binary prefix is derived from the decadic prefix by cutting off the first syllable and appending "bi" to flag the binary measure. Instead of "kibi"-byte one may also speak the long version of the name: "kilobinary"-byte.

Personally, I'd think the old names will survive in oral conversation but when writing down numbers it would be best to turn to the kilobinary system prefixes standardized (!!) by the EIC. For oral conversation I would have preferred a variant much closer to the original naming, like an appended "-s" for binary as in "kilosbyte", "megasbyte", "gigasbyte", "terasbyte", "petasbyte" and "exasbyte" with the older decimal can be differentiated for clarification with an appended "-n" like "kilonbyte"... "teranbyte"..."exanbyte" - where the "n" is a half-consontant easily cleared out in oral communication so the backethymologisation holds that is used for "speak slowly" comm. Here it would turn out that the non-suffixed names depend on context as to which base system they map. But this approach smells a lot like one of a CS grad whereas the above table is more the physisist's variant.

1024-multiply	speak	1000-multiply	origin
1024 Bytes = 1 KiB	kibibyte	1000 Bytes = 1 KB	kilobyte
1024 KiB = 1 MiB	mebibyte	1000 KB = 1 MB	megabyte
1024 MiB = 1 GiB	gibibyte	1000 MB = 1 GB	gigabyte
1024 GiB = 1 TiB	tebibyte	1000 GB = 1 TB	terabyte
1024 TiB = 1 PiB	pebibyte	1000 TB = 1 PB	petabyte
1024 EiB = 1 TiB	exbibyte	1000 EB = 1 TB	exabyte