Newsgroups: biz.comp,biz.jobs.offered,bln.comp.unix,comp.emacs,comp.ibm,comp.jobs,comp.jobs.offered,comp.lang.awk,comp.lang.c,comp.lang.c++,comp.object,comp.org.acm,comp.org.ieee,comp.os.linux,comp.os.minix,comp.os.unix,comp.programming,comp.programm
Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!spool.mu.edu!newspump.sol.net!www.nntp.primenet.com!nntp.primenet.com!netcom.com!nagle
From: nagle@netcom.com (John Nagle)
Subject: Re: + + If you can answer this, do I have a job for you !!!
Message-ID: <nagleE224EH.G1@netcom.com>
Organization: NETCOM On-line Communication Services (408 261-4700 guest)
References: <01bbe2c5$83e861e0$238303c7@business.blogic.com>
Date: Sat, 7 Dec 1996 18:51:53 GMT
Lines: 44
Sender: nagle@netcom6.netcom.com
Xref: euryale.cc.adfa.oz.au biz.jobs.offered:676176 comp.emacs:32212 comp.lang.awk:3515 comp.lang.c:177679 comp.lang.c++:204955 comp.object:51315 comp.org.acm:4187 comp.org.ieee:3935 comp.os.minix:27260 comp.programming:33583

"Stephen Wood" <swood@blogic.com> writes:
>Given that you have 500,000 integers in no particular order. 
>You want to retrieve the 11 highest values.
>How would you design you algorithm?
>What if it were the 5 highest values?  the 500 highest values?
>How do you know what to do?
>Thanks,
>Stephen Wood
>Business Logic, Inc.
>question@blogic.com   

     Who is this bozo?  There is no "blogic.com".

     This sounds like someone's homework assignment.

     Hint: If you wanted to find the single highest value, a linear
search would work, and would take O(n) time.  If you wanted to sort
the whole list, that would take longer, O(n log n) is typical, although
there are better algorithms.  If you only need a few values, a search
which maintains a pool of the N highest values will beat a sort.
But as N increases, the cost per test will increase, and for some N,
sorting will win. 

     For medium-sized values of N, ones that are not small relative to
the size of the data set, the problem is interesting, and you could 
develop specialized tree algorithms for that specific purpose.  If
the data set is too big to keep in memory, but the N highest values
would fit, that might actually be worth the trouble, since you could
avoid reading in the data more than once.

     If you really wanted to do this on a very large, well-shuffled
data set, a good approach might be to read in a small sample of the data,
sort it, extract the N highest values, and use the lowest value as
a cutoff point.  Then make one pass over the data
set, keeping everything above the cutoff.  When the kept data pool gets too
large, sort it, toss the low values, and raise the cutoff.  
Finally sort the kept data and extract the N highest values.  
This is fast, easy to code, scalable to very large data sets, suitable
for a wide range of values of N, potentially parallelizable, and 
uses existing sort routines.

     Anybody find out who the original poster was yet?

					John Nagle
