Glynx - a download manager. 

SHORT TO-DO

	- More command-line compatibility with lwp-rget
	- Complete user interface


GOALS

	generalize 
		option to use (external) java and other script languages to extract links
		configurable file names and suffixes, filesystem limits
		configurable dump file format
		plugins
		more protocols; download streams
		language support

	adhere to perl standards 
		pod documentation
		distribution

	parallelize things and multiple computer support

	cpu and memory optimizations

	accept hardware/internet failures
		restartable

	reduce internet traffic
		minimize requests
		cache everything

	from perlhack.pod
 		Keep it fast, simple, and useful.
		Keep features/concepts as orthogonal as possible (what's orthogonal?).
		No arbitrary limits (platforms, data sizes, cultures).
		Keep it open and exciting to use/patch/advocate Perl everywhere.
		Either assimilate new technologies, or build bridges to them.


PROBLEMS (not bugs)

	- It takes some time to start the program; not practical for small single file downloads.

	- It should have a graphical front-end; there exists a web front-end.

	- Hard to install if you don't have Perl or have outdated Perl modules. It works fine
	  with Perl 5.6 modules.

	- slave mode uses "dump files", and doesn't delete them.

TESTS

	- test: counting MAX_DOCS with retry
	- test: base-dir, out-depth, site leakage
	- test: authentication
	- test: redirect 3xx 	(www.ig.com.br ?)
	- test: makerel
	- test: makerel with javascript/java
	- test: cookies
	- test: env_ftp
	- test: unknown protocol is a fatal error (on page links)
	- test: folded directories
	- test: escaped save/compare for all URL names 


BUGS

		- restart/stop must rename .grx._BUSY_ => .grx

		- saving short-name AND long-name in name-list - when?

		- modify ftp.pm to return "file/link" information -- save "dir" as _index_.htm

		- looks like save-config doesn't save AUTH

		- slave should spawn if depth > 0 AND filetype = html; 
		- test if dump-file exists - don't overwrite
		- control whether a slave can spawn dump-files
		They could spawn after processing all depth>0, AND only if there were any.


OPTIMIZATIONS

		- cache the dir-list

		- use an optional database connection

		- Persistent connections;
		- take a look at LWP::ParallelUserAgent
		- take a look at LWPng for simultaneous file transfers
		- take a look at LWP::Sitemapper

		- use eval around things do speed up program loading

 		- speed up search using stacks indexed per directory or per site


DOCUMENTATION

		- document the short command-line options

		- FTP proxy


USER INTERFACE

		- how to do user-answered forms?

		- rename "old" .grx._BUSY_ files to .grx (timeout = 1 day?)
		  option: touch busy file to show activity

		- scripting option (execute sequentially instead of parallel).
		POST with interactive mode or from-file

		- perl/tk front-end; finish web front end

 		- save "to-do" file each 10 minutes, so it can restart.

		- timed downloads - start/stop hours

 		- option portuguese/english/other

		- accept --url=http://...
		- accept --batch=...grx

		- arrays for $LOOP,$SUBST; accept multiple URL

		- makerel: make relative links to OTHER sites should be an option
		- makerel: should work on applets.

		- put / / on exclude, etc if they don't have

		- graphical-interface: option iso9660

		- option compress-extension:  .tar.gz -> .TGZ (for iso9660)
		- extension .ab---z -> .ABZ

		- _names_.htm should point to ../_names_.htm ("Up to higher level directory")
		and to subdir/_names_.htm; header = "Directory listing of ... "
		- directories should be of type "DIR"
		- better formatted name-list, as in ftp-dir

		- make a logo
		- include all options, help, in graphical interface
		- graphical interface easier to configure
		- stop-task in cgi (--restart + delete grx file)


PROTOCOL

	- create variable max-link-len (now is 500 bytes)

	- improve forms support (read rfc...)
	- do not press 2 "submits" at the same time; do not press TYPE=RESET
	- explore "options"

		- ignore/accept comments: <! a href="..."> - nested comments???
		but accept javascript

		- should we read vbasic too?

 		- check: 19.4.5 HTTP Header Fields in Multipart Body-Parts
			Content-Encoding
			Persistent connections: Connection-header
			Accept: */*, *.*

 		- pnm protocol: - realvideo, .rpm files, rtsp:

 		- streams

 		- gnutella

 		- 401 Authentication Required, generalize abort-on-error list

		- install and test "https"; do a how-to.

		- 401 - auth required -- supply name:pass

		- implement "If-Range:"

		- better error handling on protocol error, for page links;
		  wrong link "c:\xxx" is a fatal error

		- make auth-digest

		- AUTH should always send nnn:ppp@url for auth-basic (always...)

		- ftp_proxy
		- --proxy option, overriding env_proxy

	The LWP::Simple interface will call env_proxy() for you automatically.
	Applications that use the $ua->env_proxy() method will normally not use the
	$ua->proxy() and $ua->no_proxy() methods.


PERL

		- make it a Perl module (crawler, robot?), generic, re-usable.
		- maybe a "LWP::Restartable"

		- funny Win-NT error "can't find" something:
		  "The system cannot find the file specified" - active perl installation error

		- javascript interpreter option


OTHER

		- name-list for other sites is creating too many empty directories.
		empty-directories should be created only when necessary, and file names
		should be stored somewhere else until the directories are created.

		- "Are we reprocessing the cache?" should trigger a filter to remove all /_INDEX_.HTM

		- should make backup when mirroring (option)

		- finish "my_link"
		- perl "link" is copying instead of linking, even on linux

		- use the name-lookup table to make up for links/redirects

		- lwp-rget "depth" is "0" when we use "1"

		- Doesn't recreate unix links on "ftp". 
		Should do that instead of duplicating files (same on http redirects).
		- http server to make distributed downloads across the internet
		- use eval to avoid fatal errors; test for valid protocols

		- don't ignore "File:" on dump-file

		- change the retry loop to a "while"

		- leitura da configuracao:
		  (1) le opcoes da linha de comando (pode trocar o arquivo .ini), 
		  (2) le configuracao .ini, 
		  (3) le opcoes da linha de comando de novo (pode ser override .ini),
		  (4) le download-list-file
		  (5) le opcoes da linha de comando de novo (pode ser override download-list-file)

		- execute/override download-list-file "File:"
		  opcao: usar --subst=/k:\\temp/c:\\download/


Generalization, user-interface: 

		- --log-headers should be an option
		- option to understand robot-rules
		- make .glynx the default suffix for everything
		- try to support <form> through download-list-file
		- internal small javascript interpreter
		- config comment-string in download-list-file
		- config comment/uncomment for directives
 		- arquivo default para dump sem parametros - "dump-[host]-1"?
		make backup on overwrite dump
		- plugins: for each chunk, page, link, new site, level change, dump file change, 
	  	  max files, on errors, retry level change. Opcao: usar callbacks, ou
		  fazer um modulo especializavel.
		- dump suffix option
		- use environment
 		- aceitar configuracao --nofollow="shtml" e --follow="xxx"
 		- controle de hora, bytes por segundo

		- packing for distribution, include rfcs, etc?

		- installation hints, package version problems (abs_url)

 		- make an object for link-lists - escolher e especializar um existente.

 		- documentar melhor o uso de "\" em exclude e subst
		
 		- Renomear a extensao de acordo com o mime-type (ou copiar para o outro nome).
   		configuracao:	--on-redirect=rename 
                	  	--on-redirect=copy
                	  	--on-redirect=link
				--on-mime=...

 		- tamanho maximo do arquivo recebido
		- usar: $ua->max_size([$bytes]) - nao funciona com callback

 		- disk full or unaccessible / alternate dir

 		- montar links usando java ?

 		- a biblioteca LWP faz sozinha Redirection 3xx ?

 		- are applets always ".class" ?

 		- criar arquivo PART com tamanho zero quando da erro 408 - timeout

 		- como e' o formato dump do go!zilla?


COMMAND LINE OPTIONS

 	- "--proxy=http:"1.1.1.1",ftp:"1.1.1.1"
  		  "--proxy="1.1.1.1"
  		    acessar proxy: $ua->proxy(...) Set/retrieve proxy URL for a scheme: 
  		    $ua->proxy(['http', 'ftp'], 'http://proxy.sn.no:8001/');
  		    $ua->proxy('gopher', 'http://proxy.sn.no:8001/');

	- accept empty "--dump" or "--nodump"

	--backup / --nobackup
			when mirroring, overwriting dump, or reprocessing links.

 	--max-mb=100
 			limita o tamanho total do download

 	--nospace
 			permite links com espacos no nome (ver lwp-rget)

 	--include=".exe" --nofollow=".shtml" --follow=".htm"
 			opcoes de inclusao de arquivos (procurar links dentro)

 	--full ou --depth=full
 			opcao site inteiro

 	--chunk=128000

	--dump-all
			grava todos os links, incluindo os ja existentes e paginas processadas

	--post-separator

------------------
