.nr Ej 1
.nr Hb 4
.nr Hs 3
.ds HP +4 +2 +1 +0 +0 +0 +0
.de xd
.sp 1V
.br
\\$1 \s+2\fB\\$2\fP\s-2
.sp 0.7V
..
.de Bb
.sp 0.5V
.in 0.4in
.nf
..
.de Be
.sp 0.5V
.in -0.4in
.fi
..
.de St
.sp 0.8V
.in 0.4in
\\$1
.in -0.4in
.sp 0.8V
..



.H 1 "Introduction"
.P
At the end of the 1970s, when the Unix\*(Tm operating system was maturing
and gaining popularity throughout the academic world, some graduate
students in North Carolina had a very interesting idea.  They hooked
together multiple Unix computers to form multi-machine \fIbulletin boards\fP --
computerized conferences.
.P
For some time, software had existed which allowed users of
a single machine to access and add to a computerized discussion.  In addition,
multi-machine electronic mail also existed, and some users were already
participating in multi-machine mailing lists which allowed computerized
discussion to take place over networks.
.P
What was new in North Carolina was simple yet powerful.  A system was
created that allowed the discussions to be distributed, and it also
made the sharing of the discussions moderately efficient.  Finally, it
was set up so that anybody who bothered could connect up and participate.
.P
The result was the exponential explosion known as USENET.   USENET is
really just a great number of computers who share a common file format
for exchanging broadcast electronic messages.   Anybody can write a message
on their computer.   Their computer then sends the message to all the
computers it talks to, and they send it to their friends, and those friends
send it on to theirs, and so on, and so on...
.P
This resulted in the world's largest computerized discussion network.
Large as it is, however, this network has no actual existence as an
entity.  Its existence is simply the sum of all the computers who are
willing to forward information around to their neighbours.
.P
There are benefits to anarchy, but there are also drawbacks.  On USENET, there
are few checks and balances to control postings.  Anybody can
post anything, and quite often they do.  As of this writing, USENET
passes over 3,000 different messages per day, with a volume of over
7 megabytes each and every day.
.P
In the early days of USENET, participants could, and often did, read just
about everything.   With hundreds of thousands of participants, that is
no longer even remotely possible.   USENET messages are broken up into
categories called \fInewsgroups\fP.  All messages are categorized into
one or more such groups, and readers only read messages in groups
of their choosing.   As time goes by, even the most enthusiastic readers
are able to read only a smaller and smaller percentage of the total
array of groups.  There are currently over 600 mainstream groups, and the
number is growing.  In addition, there are at least 500 alternate groups,
including the News Wire and electronic publication feeds of ClariNet.
.P
In most cases, even the group selection isn't enough.  Many groups handle
50 or more messages per day.  Readership estimates suggest that over 1
billion times a month, somebody reads or scans a USENET message.  At
2 seconds/message, that amounts to over 700 man hours per day of human
time (at least) and a few million dollars per year.
.P
And that doesn't include the transmission and storage costs at each of
the estimated 15,000 sites.
.P
There's just too much to read, too much to send, and people easily
get swamped.

.H 2 "RN"
.P
Part of this problem is solved by the \fIkill file\fP feature in one of
the more popular news-reading programs.  This program is known as
``RN.''  It is the work of Larry Wall and is given away freely.  RN
users can specify \fIkill files\fP, which contain patterns (regular
expressions) and associated commands.   With this, users can arrange
to not see articles that have certain strings in either the header, subject
or entire contents of an article.
.P
This, and other news killing tools have been a great boon on the net,
but no system for filtering news can handle every reader desire.
.P
Unless, of course, the system is a programming language.
.H 2 "NewsClip"
.P
That's what NewsClip is -- a programming language that lets you examine
news articles and assign them a value.  From that value, NewsClip programs
can decide whether you want to read an article or not.
.P
With NewsClip, you can examine articles with simple expressions or
arbitrarily complex programs, and control your reading through these
programs.
.P
A NewsClip generated program can filter what you read in a number of
ways.  In one mode, it can run as a standalone program which examines
your \fB.newsrc\fP, evaluates unread articles, and marks undesired articles
as read in the \fB.newsrc\fP.  In another, it can talk directly to some
newsreading programs.   How it does this, however, is unimportant
right now.
.P
The core of a newsclip program is the \fBARTICLE\fP procedure.  This is
the routine that gets called for every article that is examined.  The
purpose of this routine is to decide whether or not to accept or reject
an article.
.P
This decision can be arrived at in one of two ways.  Explicit
\fBaccept\fP and \fBreject\fP statements can judge an article immediately.
.P
Alternately, statements can alter a running \fBscore\fP for the article,
making adjustments to it until the routine finishes.  In this case, articles
that finish with a score greater than zero are accepted, and those with
scores less than or equal to zero get rejected.
.P
Here, for example, is a very simple complete NewsClip program that
rejects all articles posted to more than one group:
.Bb
procedure
Article() {
	if( count(newsgroups) > 1 )
		reject;
	 else
		accept;
}
.Be
Right away you may have noticed that the style of the language is similar
to that of C.  This has been done so that C programmers will adapt easily.
If you're not a C programmer, don't worry,
as the language is still simple enough to learn from scratch -- at least
for all the basic ways of examining news articles.
.P
You will also have noticed the pre-declared array \fBnewsgroups\fP and
the special function \fBcount\fP and the special statements \fBaccept\fP
and \fBreject\fP.  You'll read more about these later.
.H 2 "What it Means"
.P
You may already be catching on to what NewsClip means for USENET.
With NewsClip you can turn USENET into any sort of network you want.
So long as there's enough information in the articles to classify them,
you can read exactly the USENET you want to see, and little more.
.P
If you don't like certain users, you can eliminate them from your reading.
You can even eliminate all followups to their postings.  Whole sites
can be eliminated or given priority.
.P
Instead of saying what you don't want to see, (as with the \fIkill\fP file)
you can specify what you \fIdo\fP want to see.  Or you can specify what
you want for one newsgroup and what you don't want in another.
.P
With the current USENET subscription mechanism, you see everything that's
in a group, even if it's also in other groups.  With NewsClip, you can
program your reading to include only things that are in certain combinations
or groups, or exclude those combinations through any expression of boolean
logic you care to code.
.P
You can give priority to articles posted locally, or those posted by your
favourite writers.  You can see all references to your own articles or
only see articles in netwide groups that were posted strictly for your
country.  All these things can be combined as you like.
.P
If you want to go back to the USENET of yesteryear, you can.  Leave
out the new sites or assign negative scores to newcomers.  It's up to
you.
.P
You can even enforce your own USENET posting rules by rejecting articles
with long signatures, or articles that contain mostly included text.
Or you can do that only for certain users, or certain groups, as you
wish.
.P
In other words, USENET becomes, to you, just what you want it to be.
.P
We're pushing on USENET for better posting programs that put more and
more information about the type of article into the article header.
As more and more such information becomes available, you'll be able
to program better and better criteria for your reading.

.H 2 "Credits"
.P
The NewsClip system was designed by Brad Templeton, and written by
Brad Templeton and Tim Tyhurst of Looking Glass Software Ltd.
The regular expression routines appear courtesy of Henry Spencer of
the University of Toronto.  The USENET date parsing routine appears
courtesy of Steven Bellovin of AT&T.
.P
The manual was written by Brad Templeton.  Problems and correspondence should
be mailed to ``newsclip@looking.on.ca,'' and \fBnot\fP to the authors'
private mailboxes.



.H 1 "Basics"
.P
A NewsClip program consists of declarations, procedures and functions.
There are several procedures which are special, and act as ``entry points,''
so that your program takes control at the right times when articles
are scanned.
.P
A procedure looks like:
.Bb
procedure \fIprocname\fP( \fItypelist\fP )
{
	\fIlocal declarations\fP
	\fIstatements\fP
}
.Be
You will see lots of examples shortly.
.P
All the special procedures are optional, except the one named \fBarticle\fP.
Almost all programs will have global declarations, which can be
interspersed amongst the procedures and functions, but are normally found
at the beginning of the file.
.P
Most programs will only use the \fBarticle\fP procedure.  Some may use
the \fBinit\fP and \fBterminate\fP procedures, which are called when the
news filtering program starts and finishes, and others will use the
\fBstartgroup\fP and \fBendgroup\fP procedures, which are called before
and after a new newsgroup is scanned through.  For full details, see the
section on \fIentry point\fP procedures.

.H 2 "Data Types & Declarations"
.P
NewsClip comes with a small number of data types designed for use
in examining news articles.  They are the same data types that appear
in the headers of news articles, namely integers, dates, newsgroups,
userids, strings and arrays of these types.
.P
There's also a special database type, and special symbols that allow
references to text regions in the body of the article.
.P
Declarations can appear anywhere outside of procedures or functions, 
in which case they are global, or at the
start of the code for any procedure or function, in which case they are 
local to that procedure/function.
.P
To declare a variable, simply give a declaration statement with the
name of the type and the name of the variable, followed by a semicolon.
For example:
.Bb
	int likeit;
	string thename;
.Be

.H 2 "First Examples"
.P
Let's begin by looking at some simple programs.  As we described in the
introduction, the purpose of a NewsClip program is to accept or reject
articles using the \fBarticle\fP procedure.
.Bb
procedure
article() {
	if( is talk.flame )
		reject;
	if( is rec.humor && is talk.bizarre )
		accept;
	 else
		reject;
}
.Be
.P
This program introduces us to some of the most important statements in
the NewsClip language, namely \fBaccept\fP and \fBreject\fP.
.P
They cause the article to be accepted or rejected out of hand.  As soon
as they are encountered, processing of the article stops.  Thus while an
article might satisfy one \fBaccept\fP criterion and another \fBreject\fP
criterion, what actually happens depends on which criterion comes first
in the program.
.P
Thus while our example wants to accept articles crossposted to
``rec.humor'' and ``talk.bizarre'', it doesn't even get to examine those
also posted to ``talk.flame,'' which are rejected by the first statement.
.P
Also introduced here is the \fBif\fP statement.  C programmers will note
that the syntax for it is the same as the one for C, namely:
.Bb
if( \fIcondition\fP )
	\fIstatement\fP
.Be
.P
In the \fBif\fP statement, the condition is evaluated.  If it is true
(which is to say, not equal to zero),
then the following statement is done.  If the condition is not true, the
statement is not done.
.P
This program makes use of a special primitive called \fBis\fP.  The
\fBis\fP \fIoperator\fP is always followed by a newsgroup name.  It is
true if the current article was posted to the named newsgroup, and false if it
was not.
.P
Conditional accept and reject are so common that NewsClip contains
some shorthand forms for them, namely \fBreject if\fP and \fBaccept if\fP.
The above program could have been written:
.Bb
	reject if is talk.flame;
	accept if is rec.humor && is talk.bizarre;
	reject;
.Be
You will have noticed by now that all NewsClip statements end in
a semi-colon, as in C.   You can enter your NewsClip statements
in any form you like, on single lines or multiple lines, as long as you
end them with a semi-colon.
.P
The \fB&&\fP operator performs a logical ``and'' on two conditions.
\fBa && b\fP is true if, and only if both \fBa\fP is true and \fBb\fP
is true.
.P
With these tools, you can accept and reject articles based on what newsgroups
they are in.  This is like the regular ``subscription'' mechanism of
the original newsreaders, although it is more powerful because it allows
the use of logical expressions.

.H 2 "Compiling"
.P
\fI(We assume that your system has already been installed.  System admins
and binary licencees should check out the installation appendix for details
on how to install.)\fP
.P
Now that you know how to put together a simple NewsClip program, you
will want to be able to compile them.  This is done with the command
\fBncc\fP, for ``NewsClip Compiler.''
.P
Normally, you will place your programs in files with the extension
``\fB.nc\fP,'' just as C programmers use the ``\fB.c\fP'' extension.
.P
Say you have a program in the file \fBmyclip.nc\fP.  To compile, simply
issue the command:
.Bb
ncc myclip.nc
.Be
This will produce an executable program called \fBnclip\fP in the current
directory.

.H 3 "Executing"
.P
There are many ways that the generated news clipping programs can be used.
The way you will probably use first is \fInewsrc\fP mode.  This is meant to
work with news reading programs that keep track of read articles in
a file called \fB.newsrc\fP.
.P
Simply say:
.Bb
nclip mode=newsrc
.Be
or simply:
.Bb
nclip m=n
.Be
and your \fBnclip\fP program will examine the \fB.newsrc\fP file to
look for unread, unprocessed articles.  It will then examine all these
articles with your filtering program.  Rejected articles will be marked
as read, and accepted articles will be marked unread.  When it is done,
the \fB.newsrc\fP file will be updated.
.P
When you next run your newsreading program, you will not see the rejected
articles.
.P
Your \fBnclip\fP program has a number of other modes of operation.  One
takes a list of article filenames and filters out the rejected ones.
.P
In another mode, your program can talk to specially adapted newsreader
programs though pipes, allowing the filtering to take place as you
read news.  This makes your filter program seem part of your newsreader.
.P
These other modes are detailed in a special chapter.  For now, just run
your \fBnclip\fP program as described above before your newsreading
session.  You should perhaps make a shell script or shell alias to do
this for you:
.Bb
nclip mode=newsrc
rn
.Be
.H 3 "List Mode"
.P
When you are starting off, you may not wish to run your test NewsClip
programs on your real \fB.newsrc\fP file.  One way to protect it is
to copy it elsewhere before running tests.   Another idea is to run
your clipping program in \fIlist\fP mode with:
.Bb
nclip mode=list
.Be
.P
In this mode, the \fB.newsrc\fP file is examined as always, but is not
updated.
Instead, a list of the filenames of accepted articles is written to the
standard output.  You can look at these files yourself to see if your
program is doing what you want.
.H 2 "More Examples"

.H 3 "Externals"
.P
To do much more than this, you will have to learn how to declare and,
in particular, to import variables.   Most of the information about a
news article is contained in the header.  With each article, NewsClip
examines the header,
storing the various fields you are interested in
into variables which you can use in your expressions.
.P
To use these variables, you must ``import'' them with the \fBextern\fP
declaration.
.Bb
extern userid From;		/* the From: line */
procedure
article() {
	if( From == "brad@looking.on.ca" )
		accept;
	 else
		reject;
}
.Be
Here we see both the use of an imported \fBuserid\fP variable, and
the concept of string comparison.  \fBUserid\fP variables, like \fBFrom\fP,
act like \fBstring\fP variables in just about every way.   In this case,
a \fBuserid\fP variable is compared with the constant string that is the author's mail address.
.P
This program would accept any article posted by myself, and reject all others.
Wise choice!

.H 3 "If-else"
.P
In this example, you also see that \fBif\fP has an \fBif-else\fP form
such as C programmers might expect.  If the condition is true, the statement
after the
\fBif\fP is executed.  Otherwise, the statement after the \fBelse\fP
is executed.
.H 3 "Comments"
.P
You may also have noticed the comment on the declaration.  Comments start
with \fB/*\fP and end with \fB*/\fP.  They may be placed anywhere
within a program.  They are ignored by the compiler.  Note that, as in
C, you may not nest comments.  The very first \fB*/\fP within a
comment terminates the comment.
.H 3 "More Externals"
.P
The \fBFrom\fP variable is just one of many special ``header'' variables
which you can import with the \fBextern\fP declaration.   There are
special header variables for each common USENET news header line.  Each
has the same name as the header name, except dashes are replaced with
underscores where necessary.
.P
You must declare these header variables with the appropriate type.  For
example, \fBFrom\fP is always a \fBuserid\fP, while \fBmessage\_id\fP
is always a \fBstring\fP.  The other major header fields that are commonly
used are declared by:
.Bb
extern string subject;		/* The subject line */
extern string message\_id;	 /* Unique message identifier */
extern int lines;		/* number of lines in article */
extern string array references; /* message-ids of parent articles */
extern newsgroup array newsgroups;	/* the newsgroups line */
.Be
Importation of the last one, \fBnewsgroups\fP, is optional.  It's
always available whether you declare it or not.
.P
Header variables may only be imported at the global level, because importing
them has global implications.   Other variables may be imported both
at the global level, and within procedures and functions.
.H 3 "Array"
.P
Two of the above header variables are arrays.  In NewsClip, you can
have arrays of any of the 5 basic types.  They are declared by using
``\fItypename\fP \fBarray\fP'' in place of a simple \fItypename\fP,
as shown above, for example, in \fBstring array references\fP.
.P
Arrays can be indexed, their size can be taken, and you can test for
the presence of other values (such as single scalar variables) in an
array.  You will find them very useful in certain applications.
.P
For now, the only arrays you can use are pre-defined header array
variables.  Later, you will read how to create your own arrays.  If
you declare an array and use it without giving in a dimension, your
program may bomb.

.H 3 "Variables & Assignment"
.P
You can, of course, declare your own variables, either globally,
or locally in each routine, including
the \fBarticle\fP routine.   Just like in any programming language,
these variables can be used and assigned to.
.P
The simple assignment statement looks like:
.Bb
\fIvariable\fP = \fIexpression\fP;
.Be
.P
Two other assignment operators are the increment and decrement operators,
which can only be used on integer variables.  \fBvar++;\fP is the same
as \fBvar = var + 1;\fP, while \fBvar-\-\fP is the same as
\fBvar = var - 1;\fP.
.P
Unlike C, these assignment forms can only be used in independent
statements.  They \fBcan't\fP be used in the middle of an expression.
For example, this means that the common C habit of using an assignment as
the condition of an \fBif\fP statement (ie. \fBif( var = a + b )\fP)
is not allowed.

.H 3 "Identifiers"
.P
You may have noticed that in NewsClip, just about anything, except
for constant strings, can be in any case.  All upper case in identifiers
and keywords is mapped to lower case by the compiler.  This is not like
C, where the very similar looking \fBHello\fP and \fBhello\fP can be two
different variables.
.P
Otherwise, identifiers must start with a letter, and may consist of letters,
digits and the underscore character.  Names are significant to one less
character than your C compiler allows.  Most modern C compilers have
no limit.   Some older ones have a limit of 7.  If yours does, get a new
C compiler.

.H 2 "Data Types"
.P
Before going further, a more formal discussion of the NewsClip data
types is in order.
.H 3 "Integer"
.P
The \fBint\fP type and the \fBdatetime\fP type are integer types.  The
size of the \fBint\fP is machine dependent.  The size of the \fBdatetime\fP
type will always be large enough to hold date values, which are the number
of seconds since midnight GMT, Jan 1, 1970.

.H 3 "String"
.P
The \fBstring\fP type declares variables that are character strings.  You
can compare them with other strings, pass them as arguments to
subroutines, search for patterns in them or use them as search patterns.
Two important things to remember about strings are:
.AL

.LI
They are usually temporary, and only last for the duration of the processing
of a single article.
.LI
String variables actually just point to strings, they aren't the strings
themselves.  If you assign one string variable to another, they both refer
to the very same physical string in memory.
.LE
.H 3 "Userid"
.P
Variables of type \fBuserid\fP are special forms of strings.  They are only
derived from header lines like the \fBFrom:\fP line of an article.  If
you refer to a \fBuserid\fP variable, you will get the string that is
the return mail address section of the \fBFrom:\fP line.   With the
\fBrealname\fP function, you can extract the user's ``full name.''
.H 3 "Newsgroup"
.P
The \fBnewsgroup\fP type is a special and important type.  Internally,
\fBnewsgroup\fP variables are integers.  When NewsClip programs
process news articles and news files, all relevant newsgroups are
assigned unique newsgroup numbers.  These numbers are what get stored
in \fBnewsgroup\fP variables.
.P
The unique thing about \fBnewsgroup\fP values is that you can use them
in expressions wherever a string is required.  If you do, the newsgroup
number will be mapped to the string that is the newsgroup name.  Thus
you can compare a string with a newsgroup, assign a newsgroup to a
string variable, search text using a newsgroup as a string, or even search
for patterns in a newsgroup or array of newsgroups.
.P
The only thing you can't do is assign a string value to a newsgroup
variable.
.P
Newsgroup constants can be expressed in the source code by prefacing a
newsgroup name with a ``\fB#\fP.''  These constants get replaced with
the appropriate newsgroup number.  For example:
.Bb
newsgroup mygroup;
mygroup = #rec.humor.funny;
.Be
While there is no official definition of what characters can go in
a newsgroup name, NewsClip allows only alphanumerics, the dot, dash,
underbar, plus and minus characters.
.P
You can also assign a newsgroup value to an integer variable to extract
the newsgroup number, although this is not usually useful.

.H 3 "Array"
.P
As noted, you can declare arrays of any of the above types.  Normally
you will not have to define your own arrays.  The only ones you are likely
to use are the special array header variables.
.P
You can index into arrays with square brackets, as in C.  For example,
\fBnewsgroups[0]\fP is the first newsgroup on an article's newsgroups
line.  Array indices should be integer expressions, and the first element
in an array is index 0.  No check is made on indexing.  If you index beyond
the bounds of an array, your program can crash.
.P
Before using a user-declared array, you must dimension it with a statement
like:
.Bb
myarr = array 10;
.Be
Where you can provide any integer expression for the size.  Indices go
from 0 to 9 in the above case.
.P
You can get the size of an array with the \fBcount\fP operator.  As
shown in the very first sample program in the introduction,
\fBcount(newsgroups)\fP gives the number of newsgroups an article was
posted to.
.P
There is a special variant of the \fBfor\fP loop (described later) that
works with arrays, and the new \fBin\fP and \fBhas\fP operators are
designed to work with arrays.  More on these later.
.H 3 "Database"
.P
The \fBdatabase\fP type is not used in any header items, but you will
find it very useful in keeping track of collections of items, and for
remembering information from past articles.  The \fBdatabase\fP is described
in a special section of its own.

.H 2 "For Loop"
.P
NewsClip comes with two kinds of for loops.  The first is just like the
one from C.  The syntax is:
.Bb
for( \fIassignment\fP ; \fIcondition-expr\fP ; \fIassignment\fP )
	\fIstatement\fP;
.Be
The first assignment statement gets executed once, at the start of the loop.
.P
Then, for each execution of the loop, the \fIcondition\fP is evaluated.  If it
is true, the loop \fIstatement\fP is executed.
If it is false, the loop terminates.
Note that if the condition is false at the very start, the loop code is
never executed.
.P
Each time the \fIstatement\fP is executed, the second \fIassignment\fP is
executed.  After that the \fIcondition\fP is evaluated again to see if
the loop should continue.
.P
A typical use of this is a counting loop:
.Bb
for( counter = 0; counter < 10; counter++ )
	myarray[counter] = counter * counter;
.Be
.P
The second form of the \fBfor\fP loop may be more useful in dealing with
arrays and databases.  The syntax is:
.Bb
for( \fIvariable\fP in \fIarray/database\fP )
	\fIstatement\fP;
.Be
In this case, the statement is executed for every value in the array or
database.
(See the database chapter for more details on the latter.)   With each
execution, the variable is assigned the value of successive entries in
the array.
.P
This means, of course, that the type of the variable must be the same as the
constituent type of the array.
.P
For example:
.Bb
for( n in newsgroups )
	reject if n == #talk.flame;
.Be
is the same as:
.Bb
for( i = 0; i < count(newsgroups); i++ )
	reject if newsgroups[i] == #talk.flame;
.Be
It's just simpler and more readable.

.H 3 "While Loop"
.P
There's also a \fBwhile\fP loop, intended for more advanced programming.
Its form is identical to that of the C language \fBwhile\fP loop, and it is
described in the \fBstatements\fP section of the manual.

.H 2 "Compound Statements"
.P
Where we have described the \fBfor\fP and \fBif\fP statements above,
only a single statement has been shown as affected by the condition or
the loop.
.P
Quite often you will want your loops and conditions to perform more
than just single statements.  To do this, you
use the ``compound'' statement, which combines multiple statements into
a single unit.
.P
To do this, put curly braces around the group of statements, like so:
.Bb
{
	n = #talk.bizarre;
	accept if subject == "the rain in spain" && newsgroups[0] == n;
}
.Be
Usually you will do this with something like an \fBif\fP, as in:
.Bb
if( is news.admin ) {
	accept if from == "brad@looking.on.ca";
	reject if from == "ihate@bad.site.com";
	reject if lines >100;
	}
.Be
Where you place your braces, and how you indent, is up to you.  I prefer
the above style, but the choice of style is really a matter of ``religion.''
.H 2 "Summary"
.P
With these tools, and the \fBswitch\fP statement described in the
next chapter, you will be ready to construct a typical NewsClip
program to control your newsreading.



.H 1 "A Typical Program"
.P
In the previous chapters, you learned the basics of how
a NewsClip program operates.   You are now ready to write a
more sophisticated clipping program.
.P
As discussed before, the purpose of your program is to accept or
reject articles.  You should decide whether you are going to do this
by means of explicit \fBaccept\fP and \fBreject\fP statements, or
whether you would like to use a series of statements that calculate
a ``value'' or \fBscore\fP for each article.
.P
If you are going to calculate a \fBscore\fP, you will use the \fBadjust\fP
statement.  The \fBadjust\fP statement adds the expression provided to
the article's score.  For example:
.Bb
if( is talk.bizarre )
	adjust -10;
.Be
subtracts 10 points from any article posted to talk.bizarre.
.Bb
if( from == "fbaggins@shire.midearth" )
	adjust 30;
.Be
adds 30 points to any article posted by Frodo Baggins.
.P
Your \fBarticle\fP procedure might consist of a series of conditional
adjustments to the score of the article.  The score starts with a value
of one (1), which means that the default is to accept.  If, at the end,
the score is still 1 or higher, the article is accepted.  Otherwise
it is rejected.
.P
If you are using the \fBscore\fP method, you can still use explicit
\fBaccept\fP or \fBreject\fP statements at any time.

.H 2 "Sections & Switch"
.P
A typical clipping program's \fBarticle\fP procedure will consists of
both global conditions and special conditions that only apply in certain
newsgroups.  While you could do the latter with a series of statements
like this:
.Bb
if( group == #news.admin ) {
	/* statements for news.admin */
	}
else if( group == #news.groups ) {
	/* statements for news.groups */
	}
else ...
.Be
this is bulky and inefficient.
.P
The \fBswitch\fP statement lets you make multi-way decisions based on
the value of a single expression.  With \fBswitch\fP, you can test whether
the value of an expression matches a variety of constant values, and execute
a different piece of code for each constant value.
.P
A typical switch looks like:
.Bb
switch( \fIexpression\fP ) {
	case \fIconst1\fP:
		\fIstat1\fP;
		break;
	case \fIconst2\fP:
		\fIstat2\fP;
		\fImore stats\fP;
		break;
	default:
		\fIdefault-stats\fP;
		break;
	}
.Be
It is essentially a large compound statement with ``case labels'' inside.
The program jumps to the right case label for the value of the \fBswitch\fP
expression, and executes on down from there until it hits a transfer
statement like \fBbreak\fP, \fBaccept\fP, \fBreject\fP, etc.
.P
You will see a switch, in this case on the newsgroup, in our sample program
below.
.H 2 "The Example"
.P
A typical program will contain a declarations section where you declare
header variables and other externals, as well as your own global variables.
.P
This will be followed by an article section.  At the top of the article
section, you can declare local variables, and perform any per-article
initialization of variables that is required.
.P
This should be followed by the tests you want to make on all articles,
regardless of newsgroup.
.P
After this, you will want to place a for loop that scans each newsgroup
in the \fBnewsgroups\fP array.  For each newsgroup, you will want to
perform conditional tests based on the newsgroup and what you like to
see or not see within it.  You do this with a large \fBswitch\fP statement
and individual \fBcase\fP statements for each newsgroup constant.
.P
After the \fBfor\fP and \fBswitch\fP loop, you can include any further
tests which you wish to apply to all articles that have not already been
accepted or rejected.
.P
Here is a sample program.  In this program, you'll also see the new \fBhas\fP
operator, which does regular expression pattern matching.

.Bb
extern string subject;
extern userid from;
extern newsgroup array newsgroups;
extern int followup;			/* is it a followup? */
procedure
article() {
	newsgroup n;
	/* every article code */
	/* I like articles from this guy */
	if( from == "goodguy@nice.site.com" )
		adjust 10;
	/* but I prefer to avoid followups */
	if( followup )
		adjust -6;
	/* if the article is in alt.flame, forget it.  This could
	   also be in the case section */
	if( is alt.flame )
		adjust -8
	/* now the newsgroup specific stuff */
	for( n in newsgroups ) switch( n ) {
		case #news.admin:
			if( subject has "voting" )
				adjust 20;
			else if( from has "badsite.edu$" )
				reject;
			break;
		case #rec.humor;
			/* adjust the score of messages that are crossposted
				to groups you don't like */
			if( is talk.bizarre || is alt.flame )
				adjust -10;
			break;
		case #sci.physics:
			/* I only want to see messages that are crossposted
			   to both sci.physics AND sci.astro, not just one
			   of them */
			reject if !is sci.astro;
			accept;
			/* no break; needed after an unconditional accept */
		case #comp.risks:
		case #rec.arts.sf-lovers:
			/* my favourite groups */
			adjust 20;
			break;
		case #alt.sex:
			reject;	/* don't show me anything here */
		default:
			/* in the other groups, if the article is heavily
			   crossposted, drop a point for each group it is
			   crossposted to. */
			if( count(newsgroups) > 3 )
				adjust -count(newsgroups);
			break;
		}
}
.Be
.P
Note that because of the \fBfor\fP loop on the \fBnewsgroups\fP array,
the \fBswitch\fP gets executed for each group.  That means that
if the article is crossposted to 5 groups, 5 different cases will get
executed.   Of course, if any of the cases does an \fBaccept\fP or
\fBreject\fP, the processing stops right there.
.P
Alternately, you may wish to not do a \fBfor\fP loop and just
\fBswitch\fP on the variable \fBmain\_newsgroup\fP -- the current
newsgroup in a \fB.newsrc\fP scan or newsreader session.  You might
also switch on \fBnewsgroups[0]\fP, the primary newsgroup in the
\fBnewsgroups\fP list.
.P
Some sample template programs are included with the NewsClip system
to get you going.  They can be found in a special directory that
was created by the person on your machine who installed the program.
That directory might be \fB/usr/lib/news/newsclip\fP or it could be
somewhere else.   Check the ``man'' page for the ``nclip'' program
by issuing the shell command \fBman nclip\fP to find out what
directory to look in.
.P
A simple shell like the one above can be found in the file \fBshell.nc\fP
in that directory.  More complex programs can also be found there.
.P
From this point, all you have left to learn in order to write sophisticated
news clipping programs are the various special variables, functions and
operators that help you make your conditional expressions.  These
are documented, along with other tips and tricks, in the following
chapters.




.H 1 "Operators & Searching"

.H 2 "Integer Operators"
.P
You have already been shown examples of a variety of integer operators
without explanation.   That's because we expect the use of these operators
to be fairly obvious to anybody who has done a little programming.
.P
NewsClip allows integer expressions just like C.  In fact, the priorities
of the operators are exactly the same.
.P
Operators allowed are \fB+\fP (addition), \fB-\fP subtraction and unary
negation, \fB*\fP (multiplication), \fB/\fP (integer division) and
\fB%\fP (integer modulus).
.P
There are also some bitwise integer operators, namely \fB|\fP (bitwise or),
\fB&\fP (bitwise and) and \fB^\fP (bitwise exclusive or).
.P
Logical operators include \fB!\fP (unary not), \fB&&\fP (logical and)
and \fB||\fP (logical or).  The logical ``and'' and ``or'' operators work
efficiently.
Thus if you have \fBa && b\fP, \fIa\fP is evaluated, and if it is false,
\fIb\fP is not evaluated as it can't change the result of the expression.
.P
Logical arithmetic is done with integer values.  Non-zero values represent
true, and the value zero represents false.  In most cases, true is represented
by one.  In fact, there are two predefined constants, \fBtrue\fP and
\fBfalse\fP, which are 1 and 0, respectively.
.P
There are several comparison operators.  They return 1 or 0 depending on
whether the comparison is true or false.  For integers, the operators include
\fB==\fP (equality), \fB!=\fP (inequality), \fB>\fP (greater than),
\fB<\fP (less than), \fB>=\fP (greater or equal) and \fB<=\fP (less or
equal).
.P
String, userid and newsgroup values can also be tested for equality or
inequality.  String equality is case sensitive, but most strings from
headers are converted to lower case in advance so that caseless matching
can be done.  When testing for string equality on variables like
\fBfrom\fP, always use lower case strings.  For example,
\fBfrom == "foo@bar.uucp"\fP will work while \fBfrom == "foo@bar.UUCP"\fP
will never match.

.H 2 "Pattern Matching (has)"
.P
An important tool in judging news articles is ``searching'' or
\fIpattern matching\fP.  Most
of an article is simply strings and text, and often you will wish to
find out if certain words, strings or phrases are contained within
this text.
.P
Pattern matching is done with the \fBhas\fP operator.  You can ask
whether a string ``has'' a pattern, which is to say whether something
that matches the pattern is contained within the string.  For example,
.Bb
subject has "fusion"
.Be
is true if the word ``fusion'' appears anywhere in the \fBsubject\fP, and
false otherwise.
.P
While you can get a lot done searching for words and substrings like
``fusion'', NewsClip actually supports searching for much more
complex patterns, defined by a language known as ``regular expressions.''
.P
If you have used any of the popular Unix text editors such as ``vi'' or
even ``ed,'' you will be somewhat familiar with regular expressions.
Regular expressions allow you to search not just for a simple string,
but variations of the string.
.P
Newsclip uses the regular expression language used by the Unix searching
command \fBegrep\fP.  This is a superset of the language used by the
\fBed\fP text editor.   If you don't know about regular expressions
at all, we advise you to read about them in the manual for \fBed\fP
or \fBgrep\fP.  If you are already familiar with \fBed\fP, then
you can read the description of the extensions to that language found
at the end of this chapter.
.P
Note that the ``or'' operator (\fB|\fP) in regular expressions is
currently not very efficient.
.P
In general, the characters in the set ``\fB^$.[]()+?|\\*\fP'' have
special meanings.  If you want to match one of those characters
literally, you must preface it with a backslash.  For reasons too confusing
to be explained here, if you want to match a literal backslash, you must
use four (4) backslashes in quoted pattern strings.

.H 3 "Searching"
.P
The most common place to search will probably be the \fBsubject\fP line.
This is easy, and an example is shown above.  You can also search
\fBuserid\fP variables (although you will only search the mail address
part) or other strings.   You can also search a \fBnewsgroup\fP variable,
since they are always converted to strings when necessary.
.P
It is possible to search in arrays.  For example,
\fBnewsgroups has "^comp"\fP will tell you if any of the newsgroup names
starts with ``comp'' -- ie. if any are in the computer hierarchy.
.P
You can also search databases -- see that chapter for more details.
.P
Patterns need not be constant strings.  They can also be variables and
expressions of the string, userid or newsgroup types.   In fact, they
can even be arrays of those types.
.P
If you search for an array of patterns, you will get a true result if
any of the patterns is found in the area you're searching.   You can
even search for an array of pattern-strings in another array of
strings.
.P
Searching has all sorts of general uses.  For example,
\fBfrom has "@bad.site.edu$"\fP tells you if the article was posted
from the specified site.   \fBSubject has "^re:"\fP tells you if the
subject was generated in a followup.   \fBDistribution has "^usa"\fP
tells you if one of the article distributions was the USA's national one.
.P
Naturally, you can combine searches with logical operators.  For example,
.Bb
\fBsubject has "star wars" && subject !has "reagan|sdi"\fP
.Be
might help you
find articles about the movie, but not about the laser defence system.
\fB!has\fP, if you didn't guess, checks to see if a pattern is \fBnot\fP
in the specified string.
.P
Note that pattern matching is case sensitive.  Most header items you
will search, such as the subject, summary and from lines, are lowercased
in advance.  You should thus only search for lower case letters in such
regions.  If you set the \fBpreserve\_case\fP flag, mapping of header
sections to lower case will not be done.
.Bb
extern int preserve\_case;
preserve\_case = true;
accept if subject has "Reagan"
.Be

.H 4 "Typing in Patterns"
.P
Patterns will normally be constant strings.  This presents an
interesting problem when attempting to escape special characters.
.P
First of all, it is possible to include escape sequences, which start
with backslash, inside constant strings.  For example, you can insert
a quote with \fB\\"\fP, and things like newlines, tabs and general
characters with strings like \fB\\n\fP, \fB\\t\fP and \fB\\008\fP --
all the standard escapes for constant strings in the C language.
.P
The backslash character is also used in patterns to escape regular
expression metacharacters like ``\fB$\fP'' and ensure they get matched
as literal characters.  If we followed the C rules, you would need to
type two backslashes before everything you wish to escape.
.P
Fortunately the C escape characters and the pattern matching metacharacters
don't overlap, except in one place, the backslash itself.
.P
Thus if you type something like \fB\\$\fP in your constant string, it is
mapped to a real backslash and a real dollar sign, instead of just a dollar
sign the way C would.   You can thus type in your patterns much the
same way as you would type them to a text editor.  If you want to type
two backslashes before your \fB$\fP, you still can, though.
.P
The one problem is backslash itself.  It needs to be escaped in both
methods.  This means that if you want to get a pattern that matches
a real backslash, you need four (count 'em, four) backslashes.  Fortunately,
you don't have to do this a lot.
.P
Examples:
.Bb
str has "abc$"		/* matches abc at end of line */
str has "abc\\$"	 /* matches abc$ */
str has "abc\\\\$"	  /* also matches abc$ */
str has "abc\\\\\\\\$"	    /* matches abc\\ at end of line */
.Be

.H 3 "Searching the Body"
.P
You may wish to search for more than just items from the article header.
If you want to really check over an article, you can perform searches in
the body or ``contents'' of the article.
.P
The NewsClip language defines 5 different regions of the article body
that you can search in.  Special pre-declared names have been given to
these regions.  These names can only be used with the \fBhas\fP operator
and some special functions.
.P
These names are:
.sp 0.5V
.in 0.4in

.sp 0.7V
.ti -0.4in
body
.P
The entire text of the article, not including the header, but including
the signature.
.sp 0.7V
.ti -0.4in
text
.P
The text of the article, not including the signature.
.sp 0.7V
.ti -0.4in
signature
.P
The signature of the article, but none of the text.
.sp 0.7V
.ti -0.4in
newtext
.P
The text of the article that is original, which is to say not included
from a previous article.  The signature is not included.
.sp 0.7V
.ti -0.4in
included
.P
The text of the article that was included from a previous or parent article.
.in -0.4in0
.P
For example, \fBtext has "hello"\fP would return true for articles that had
the word ``hello'' anywhere in their text, but not if it appeared in the
signature section of the article.
.Bb
reject if body has "compact disc";
.Be
would reject all articles that contain that string.

.B "Important Note:"
.P
Pattern matching is case sensitive, but normally all sections of the
article body are lower cased before searching is done.  Thus you should
specify all your search strings in lower case.  There is an integer
variable named \fBpreserve\_case\fP, which if declared and set true,
stops the mapping of the article body and various header sections to lower
case.  If you set this
flag, your patterns must be in the exact case you're looking for.

.H 3 "Body Parts"
.P
As you may know, there are no official definitions on USENET for what
delineates an article's text from its signature, and even for what distinguishes
lines included from a previous article from original lines.
.P
In order to make these distinctions, the NewsClip text processing library uses
some special patterns to spot included lines and the start of a
signature.
.P
The default signature pattern is a line that starts with 2 dashes, and is
followed by any number (including 0) of spaces and the end of the line.
The regular expression is ``\fB^-\- *$\fP.''  You can change this
pattern with the special procedure \fBset_signature_start\fP.
You might try
.Bb
set\_signature\_start( "^(-\-\-*|====*)$" );
.Be
to match lines of dashes or
equals signs -- whatever suits your fancy.
.P
Included lines are deemed to be those that start with a special pattern.
The default is the pattern ``\fB[>:%#]\fP,'' meaning any one of those
four characters.  The styles used vary from user to user, and no one style
will be correct.  If you want to get really fancy, you can set the pattern
according to the poster using \fBset\_include\_prefix\fP.   It is perhaps
safest just to use the ``>'' character, which is the default used by
most news posting programs.  You do not need to worry about white space
in front of the pattern.  That is removed before the pattern check is done.
.H 3 "Speed & Space"
.P
If you don't really care about the various parts of the article body, just
do all your searches on the whole body (\fBbody\fP) -- it's faster.
.P
In fact, it's worth noting that any call to scan the article body can be
fairly time consuming because of the large amount of disk I/O required.
You will probably want to scan the article body only in certain newsgroups.
.P
Because you might do several scans on an article body, the first such scan
reads the body into memory.  On some machines, memory is limited.  This
means that only the first N bytes of the article can be scanned.  Most
machines have no limit, although a ``small model'' news filtering program
on an Intel 80286 might have a limit of 40 kilobytes or so.   This is
not a problem, as the average USENET article is only 2 kilobytes in length,
and the rare articles that exceed 40K are all source code and binary postings
that you probably don't wish to search at all.  (A news filter compiled
for ``large model'' on the 80286 need have no limit.)

.H 3 "Paragraph Scan & White Space"
.P
The NewsClip library offers the ability to scan text as a range of
paragraphs rather than a range of lines.  If you set the integer
variable \fBparagraph\_scan\fP to be true, you will activate this feature.
.P
In this mode, lines will be grouped into paragraphs before they are searched.
That means that if you are searching for a phrase, and it happens to cross
a line boundary, you will still find it.  You would not find it in the
regular line mode.
.P
To further help in this, another integer variable \fBwhite\_compress\fP
can be set to be true.  If you do this, all runs of spaces, tabs, form
feeds and newlines (in the case of paragraph mode) will be compressed to
a single space.  Thus you can search without worrying about how the poster
spaced his or her phrases.   This can also be achieved through clever use
of regular expressions.
.P
It is worth noting that these two modes default to off because it does take
time to do all this processing, and if you don't need these special features,
you won't want to spend the time on them in every article.  In many cases
you may decide that it's more important to be efficient than to properly
match every valuable USENET article.

.H 2 "Article Statistics"
.P
Aside from searching the body of an article, you can also get statistics
on the sizes of the various sections.
.P
The \fBbyte\_count\fP function gives the number of bytes in a text section.
For example, \fBbyte\_count(signature)\fP tells you how many bytes there
are in the signature.  With this, you can also do comparisons on the
relative sizes of article sections.
.Bb
reject if byte\_count(text) / byte\_count(signature) < 2
.Be
rejects articles where the signature is more than half as big as the
text of the article.
.P
You can also count lines with \fBline\_count\fP.
.Bb
reject if line\_count(newtext) < line\_count(included);
.Be
rejects articles that are mostly included material from another article.
.P
These two forms of article statistics involve reading the whole body of
the article, which takes time.   If all you want is the number of lines
in the body, the header variable \fBlines\fP, while not always correct,
is usually good enough.
.P
Likewise the variable \fBarticle\_bytes\fP, which gets the size of the
whole article through the use of the Unix \fIstat\fP function, can be
a much faster way of accurately measuring the size of an article.
You can also get the integer variable \fBnum\_links\fP  (the number of links
to a cross-posted article file) and the date variable \fBwrite\_time\fP
(the most recent time the article file was written to) which are also
obtained by NewsClip from a \fIstat\fP.
.P
The string variable \fBarticle\_filename\fP lets you know the file,
if any, in which the current article resides, and the integer variable
\fBarticle\_number\fP tells you the article number, assuming it fits in
an integer on your machine.

.H 2 "Searching for Scalars (in)"
.P
You can use the \fBin\fP operator to test for the presence of a value
or set of values in an array or a database.  For example:
.Bb
reject if #alt.flame in newsgroups;
.Be
rejects articles if the newsgroup ``alt.flame'' is one of the members of the
\fBnewsgroups\fP array.   You can think of this operator as similar to the
``\(*e'' (set membership) operator from set theory.
.P
Testing for the presence of a constant newsgroup in the \fBnewsgroups\fP
array is such a common thing that we have included a shorthand for it
that you have already learned.  This is the \fBis\fP operator.  Thus
\fBis alt.flame\fP is the same as \fB#alt.flame in newsgroups\fP.
.P
You can search for integers in integer arrays, newsgroups in newsgroup
arrays, and newsgroups, userids or strings in arrays of newsgroups, userids
or strings.   You can even search for arrays in other compatible arrays.
The \fBin\fP operator returns true if any match is found.
.P
Unlike the pattern matching \fBhas\fP operator, exact matches are required
in the string comparison.
.P
You can also search for strings or string compatible arrays in databases.
More on that in the database chapter.
.P
One array you might find useful is the \fBpath\fP array.  That's an array of all
the sites that have passed along the news article.  If you are programming
the filtering of a news feed, you will want to ensure you don't send your
feed site any articles it has already seen.  If you are feeding site
``foo'', you will want to write:
.Bb
reject if "foo" in path;
.Be
If you know that ``foo'' is also fed by ``bar'' and ``baz'', you could
create a string array with those three names, or just say:
.Bb
reject if "foo" in path || "bar" in path || "baz" in path;
.Be
.H 2 "Regular Expressions"
.P
As noted above, the search strings used with the \fBhas\fP operator
are patterns from a language called ``regular expressions.''  The
basics of this system are defined in the documentation for the
\fBed\fP text editor, and the \fBgrep\fP and particularly \fBegrep\fP
file searching programs.
.P
Patterns in NewsClip follow the patterns of \fBegrep\fP, which is to
say they include the patterns of \fBed\fP, plus the following rules:
.AL

.LI
The special tag characters \\( and \\) are not supported.
.LI
A regular expression followed by a plus (\fB+\fP) matches one or more
of the regular expression.  (This is like \fB*\fP, except it matches 1
or more cases rather than 0 or more cases.)  Thus ``a+'' matches
\fBa\fP, \fBaa\fP, \fBaaa\fP and so on.
.LI
A regular expression followed by a question mark (\fB?\fP) matches 0
or 1 occurrences of the regular expression, ie. the expression
is optional.
.LI
You can search for two different regular expressions by putting an or
bar (\fB|\fP) between them.  Either the first or the second may match.
Be warned that currently this is not very efficient.
.LI
You can group patterns together with parentheses so that the scope of
the above rules applies where you like.  ``abc(def|ghi)jkl'' matches
``abcdefjkl'' or ``abcghijkl,'' for example.
.LE

.H 3 "Literal Patterns"
.P
Clearly there are times when you don't want to use the regular expression
metacharacters in a search string -- you wish to search for the exact
string, even if it has the special characters in it.
.P
When you have a typed-in pattern string, you can do this easily by escaping
the special characters with a backslash.  For example, \fB\\$\fP matches a
real dollar sign.
.P
If you get a string from an article and you wish to use it as a pattern,
you should pass it through the \fBliteral\_pattern\fP function.  This
escapes all special characters for you, so that pattern matching on the
resulting string will only match strings that contain the exact text of
your pattern string.  For example, \fBliteral\_pattern( "It costs $5.93" )\fP
will be the string ``\fBIt costs \\$5\\.93\fP'' with the special characters
escaped.
.P
It is important to do this if you are storing strings from an article
(like the \fBsubject\fP) in a database for later searches using a database
of patterns.



.H 1 "Databases"
.P
Quite often your news clipping decisions will be based on past history --
lists of users, topics or sites that you either like or don't like.
One particularly common desire is to ask to see or not see the followups
to a given article.
.P
To keep track of old information, and in general to read information to
and from disk, NewsClip programs use the database type.
.P
A database is effectively an array of integer values that is indexed with
\fIstrings\fP instead of numbers.  If you index into a database with a string,
you'll get back the integer value for that index, or 0 if the string isn't
in the database at all.
.P
Quite often the integer value is unimportant,
and all you care about is whether or not a given string has an entry in the
database or not.  In this case, one can use the \fBin\fP operator to find
out if a string, or any member from an array of strings, is ``in'' the
database.
.P
Let's say you have a list of USENET posters whom you don't like.  You might
want to give negative scores to all the articles they write.
.Bb
	extern userid from;
	extern int score;
	database badusers;
procedure
init() {
	badusers = fresh\_database( 5 );
	badusers["bad@foo.bar"] = -5;
	badusers["worse@site.com"] = -10;
	badusers["worst@ihate.edu"] = -20;
}
procedure
article() {
	/* this adjusts score by 0 if not found */
	adjust badusers[from];
	/* and other code, of course */
}
.Be
The above scheme needn't be just a bad user database.  You could assign
positive scores to good users, and they would be adjusted upwards using 
this scheme.
.P
If scores aren't your concern, you could also simply say
\fBreject if from in badusers;\fP
.P
The \fBfresh\_database\fP function creates an empty database,
in this case expected to hold an average of 5 indices.   The 5 is not
a hard and fast limit -- you could still have 100 or even 1000 indices, but
database use would then get very slow.   The closer your guess is to the
real number, the more optimal your use of the database will be in memory
space and speed.

.H 2 "Message-ids"
.P
Naturally, one thing you will want to keep a database of is article
message-ids.  The message-id is a string that is unique for every USENET
article posted.  Every followup has a line called \fBreferences\fP which
indicates the message-ids of the articles to which this is a followup.
(If an article is a followup to a followup -- and there are far too many
of these -- there will be two message-ids in the \fBreferences\fP array.)
.P
Let's update our program to include programmed control of a \fBmessages\fP
database.  If an article comes in from a user in our \fBbadusers\fP database,
we will reject it, of course, but we'll also store its message-id in
our messages database.  Then we can reject the followups, too.
.Bb
	extern userid from;
	extern string message\_id
	extern string array references;
	database badusers;
	database messages;
procedure
init() {
	badusers = fresh\_database( 5 );
	messages = fresh\_database( 100 );
	badusers["bad@foo.bar"] = true;
	badusers["worse@site.com"] = true;
	badusers["worst@ihate.edu"] = true;
}
procedure
article() {
	extern int followup;
	if( from in badusers ) {
		messages[message\_id] = true;
		reject;
		}
	reject if followup && references in messages;
}
.Be
.P
This program has been set to simply reject based on the presence of a user
or ``parent'' (\fBreferences\fP) in the database.  With a little cleverness,
you could arrange to use the score system, so that all followups get the
same adjustment that their parent got when it was accepted or rejected.
.P
(We do mean cleverness, as the indexing feature doesn't work on an array
like \fBreferences\fP, and you would have to write your own loop to find
the first valid parent and its score.)
.P
Note that we don't try to use \fBreferences\fP until we know it is there
and defined.  The \fBfollowup\fP variable tells us this, although we
could also have tested if \fBreferences != nilarray\fP.

.H 2 "Disk Files"
.P
All this is interesting, but not very useful if you can't have databases
remember things from session to session of news clipping.

.H 3 "Reading in a Database"
.P
To read a database from a file, use the \fBread\_database\fP function.
You provide it with the filename of the database.
.P
Database files have a particular format that includes the integer field
value, a date of last access/change and the string index.  You may, however,
create your own database files, or add lines to existing database files
quite easily.   Just add lines with nothing more than the index string.
.P
(You will read about the uses of the date of last access/change later.)
.P
In fact, your first databases, unless they are created with newsclip programs,
will probably be simple files where each line is a database index string.
When you read in such lines, the integer value field will be set to 1,
and the date of last access set to the current time and date.  If you want
to create databases with integer values other than 1, see the reference
section.
.P
If you had a file \fB/tmp/baduser\fP that contained the lines:
.Bb
bad@foo.bar
worse@site.com
worst@ihate.edu
.Be
then you could create our badusers database with:
.Bb
badusers = read\_database( "/tmp/baduser" );
.Be
instead of the \fBfresh\_database\fP call.
.P
If you create a database in this way, you should later write it out
with the \fBwrite\_database\fP procedure described below, so that
the lines get proper dates and the database is written out in the
proper format.
.P
If the database you try to read isn't present on the disk, you'll get
an empty database -- the same as if you had coded \fBfresh\_database(30)\fP.

.H 3 "Writing the Database"
.P
To write out a database, you say:
.Bb
write\_database( base, filename, oldest );
.Be
where \fIoldest\fP is a date value that represents the oldest record that
you want written out.
.P
Some databases, most notably those of message-ids and subject lines,
will contain elements that age with time.
There is no point in keeping an old message-id in the database when nobody
is referring to the original message any more.
.P
By specifying a date argument, you can arrange for old, unused elements
of the database to ``expire,'' and not get written out.
.P
Every time a database element is created, changed, or referenced with array
indexing or the \fBin\fP and \fBhas\fP operators, the access time for the
record is
updated to the current time.  If you want to expire all elements that have
not been used in a month, provide the date value for one month previous
to the current date when you write out the database.  For example,
.Bb
extern datetime time\_now;
write\_database( messages, "/me/mymessages", time\_now - month );
.Be
will do the trick.  As you might guess, \fBtime\_now\fP contains the
current date and time, and \fBmonth\fP is a special constant that
contains the number of seconds (dates are measured in seconds) in one
month (approximately).
.P
If you write out an empty or nil database and the database file doesn't
currently exist, it will not be created.
.H 2 "Datbabase Filenames"
.P
You may want to keep all sorts of databases.  In particular, you may
want to keep databases on both the all-article or ``global'' level, along
with a variety of small databases on a per-newsgroup level.
.P
If your database needs are small, you may find that database lookup is
fast enough that you can keep all your database information in global
databases -- even if the information is stuff that's likely to only
occur in one newsgroup like subject lines or message-ids.  If your needs
get complex, you may decide to maintain individual databases of such
things for some, or even all, of your newsgroups.
.P
To help you do this, certain escape sequences can be used in database
filenames to let you read and write your database files to the right
place in the filesystem.  All the escapes start with a tilde.
.P
The most useful ones are a single tilde (\fB~\fP) which expands to your
home directory, and \fB~n\fP, which expands to the name of the current
newsgroup (\fBmain\_newsgroup\fP) with dots mapped to slashes, the way
they are in the news spool directories.  RN keeps the kill file for
a group like ``rec.humor'' in \fB$HOME/News/rec/humor/KILL\fP.  You can
do the same by using a filename like \fB~/News/~n/badsubjects\fP.  If
your home is \fB/u/brad\fP and the newsgroup is ``rec.humor,'' this
expands to \fB/u/brad/News/rec/humor/badsubjects\fP, which would be
your bad subjects database for the newsgroup rec.humor, of course.
.P
Other escapes are possible.  They are defined in the reference section.

.H 3 "Newsgroup Specific Databases"
.P
To make use of such databases, you will want to use the \fBstartgroup\fP
and \fBendgroup\fP procedure ``entry points'' of a NewsClip program.
.P
As you learned earlier on in the manual, NewsClip programs allow
a variety of entry point procedures, most of them optional.
These two can come in handy now.
.P
The \fBstartgroup\fP procedure is executed whenever processing of a
new newsgroup starts.  That new newsgroup will be stored in the
variable \fBmain\_newsgroup\fP.   Here is where you can initialize
things for the newsgroup -- such as reading in newsgroup specific databases.
.P
The \fBendgroup\fP procedure is called when processing of a newsgroup is
finished.
.Bb
	database groupkill;
	extern string subject;
procedure
startgroup() {
	groupkill = read\_database( "~/News/~n/subjects" );
}
procedure
endgroup() {
	extern datetime time\_now;
	write\_database( groupkill, "~/News/~n/subjects", time\_now - 3*week );
	free\_database( groupkill );
	groupkill = nildatabase;
}	
procedure
article() {
	if( drop\_re( subject ) in groupkill )
		reject;
}
.Be
.P
This program maintains databases of subject lines that you don't want to
see in each newsgroup.  There doesn't have to be a file for each newsgroup.
If the file is missing, an empty database will be created that matches
nothing, and no file will be written out at the end.
.P
You will note that after writing out the database, expiring any items that
have not been seen in 3 weeks, we called the \fBfree\_database\fP procedure.
Unlike most items allocated in a NewsClip program, databases do not
stay in temporary memory -- they have to last from article to article.
.P
If you're reading in a database for each group and getting rid of it when
the group is done, you will want to free the memory it uses.  Whenever
you free the memory used by a database, you should set the descriptor
that used to refer to that database to \fBnildatabase\fP, so that no
further attempts are made to refer to it. 
.P
Another function has been introduced here -- \fBdrop\_re\fP.  This function
is intended for use on subject lines.  It removes any ``Re:'' prefixes
from the subject line.  If you are storing subject lines in a database,
you will want to do this without any ``Re:'' prefixes that might get
added by followup software.  When you search, you will want to remove
these prefixes as well.
.H 4 "File Existence"
.P
It will often be the case that local newsgroup databases will not be
present for all databases.  While attempts to read such non-existent
files produce empty databases rather than errors, you may wish to avoid
even that.
.P
With the {fun} exists function, you can test if a file exists at all.
You might wish to set flags if the database is not found.
.Bb
if( !exists( "~/News/~n/bigbase" ) )
	dontscan = true;
.Be
.H 3 "Creating Directories"
.P
Unfortunately, the \fBwrite\_database\fP procedure is not able
to create directories for files as needed.  If you write to a database
using \fB~n\fP in the filename, the write might well fail because
the directory structure for ``comp/sys/ibm/pc'' (for example) might not
exist in your database directory.
.P
You must create such directories by hand.  If you are using the RN
newsreader, you will find that its ``Ctrl-k'' command (for editing
kill files) will create such directories for you.  (It then places
you in the editor to edit your kill file, but you can quit without
writing out your file and the directories will still be created.)
.P
If you don't wish to have to create directories, you can use the
\fB~N\fP code instead of the \fB~n\fP code.  It creates a name
that fits in a single filename segment, doing the best it can in the
limited (usually to 14 characters) space.
.P
If the entire newsgroup name will fit in a file name, it is unaltered,
and no check character is added.
.P
For example, for ``comp.sys.ibm.pc'' you might get the file name
``\fBHCp.sys.ibm.pc\fP'' where ``H'' is a ``checksum'' character.
.P
This will normally be unique, but there is no guarantee of it.  As new
newsgroups are created, two might collide.  Use this form at your own
risk.
.H 2 "Database As Strings"
.P
The normal purpose of the database is the high-speed ``hashed'' lookup
of exact search strings.   As databases are the only items which
NewsClip programs can read and write to disk, a few other features
are available.
.P
It is possible, for example, to search all the indices in a database
with the \fBhas\fP operator.   If you write \fBdb has "a.*b"\fP, you
will be told if any of the index strings contain that pattern.
.P
It is also possible to use a database as a group of pattern strings.  If
you use a database on the \fIright hand\fP side of the \fBhas\fP operator,
then all the index strings in the database will be used as patterns to
search your text area.
.P
This allows you to keep a file of search patterns on disk, rather than
hard coding them into your program.  Searching in this way is somewhat
slower than searching for patterns hard coded into the program, however.
In some ways, keeping a database of search patterns on disk is the closest
thing to an RN kill file.
.P
Note that when you read patterns from disk, you don't need four backslashes
to match a single literal backslash, the way you do with constant patterns
in your program.  Two backslashes will do.
.H 2 "Database FOR Loop"
.P
You can scan through all the strings in a database with NewsClip's special
variant of the \fBfor\fP loop.  The syntax is:
.Bb
for( \fIstringvar\fP in \fIdatabase\fP )
	\fIstatement\fP;
.Be
.P
This will cause a loop that loops through the database.  On each iteration,
the string variable will have a different index value from the database.
There is no particular pattern to the order that the database is traversed,
but you will get each index once and only once.
.P
For example, you could count the number of matches of a pattern in a database
like this:
.Bb
int matches;
string dbstr;
database db;
db = read\_database( "myfile" );
matches = 0;
for( dbstr in db )
	if( dbstr has "a.*b" )
		matches++;
.Be
.P
When performing such a loop, it is possible to get the integer value for
the index in the normal way, with \fBdb[dbstr]\fP in this case.  This
is not particularly efficient, however, as this special \fBfor\fP loop
is intended to scan the strings of the database, not the values.

.H 2 "Database Uses"
.P
You can use database as ``kill files,'' or as their reverse -- what
you might call ``keep files.''  It is possible, however, to do much
more.  As noted, databases can remember scores, and you can assign
those scores to articles that have fields in your databases.
.P
You may wish to keep lots of database files, or just a few.  If you
aren't using the database index integer value fields for anything,
you can combine several databases in one database by varying the value
field.   You might make it 1 for message-ids to keep, -1 for message-ids
to reject, 2 for users to accept, -2 for users to reject and so on.
Such strings are unlikely to ever overlap.
.P
If you don't want to keep individual databases for individual newsgroups,
you can also concatenate newsgroup names and the key strings together before
adding to and searching the database.  The \fBconcat\fP function helps you
do that.
.P
Your system manager might also keep system databases associated with
newsgroups in the news spool directories, the news library directory or
even the newsclip library directory.  Check local system documentation for
information.
.P
If you do not use the automatic \fBxref\fP feature,
you can use databases to avoid processing crossposted articles more than
once.  If you see a crossposted article during a session, you might record
its message-id in a database, and then check to avoid processing articles 
already in the database.   Normally you just import the \fBxref\fP variable
on machines that keep \fBXref:\fP lines in their articles.
.P
(Alternately, we suggest rejecting articles where the first group in
the \fBnewsgroups\fP array that is a \fBnewsrc\_group\fP is \fB\fP not
the \fBmain\_newsgroup\fP.)
.P
The nice thing about the database is that other programs, such as your
newsreader, can easily update your database files.  Usually all it takes
is adding a line to the end of the file containing the string you want to add.
You may be able to program macros in your newsreader that stick records into
your news filter databases with the touch of a key.

.H 2 "Notes"
.P
Assigning to a database index creates an index if it isn't already
present in the database.  You must thus be careful to only make assignments
when you actually want the entry to be in the database.   Don't
assume that assigning an integer value of 0 will stop the index from
being present in the database.
.P
If you do assign 0 to an index, you will of course get 0 back when you
try to get the value of that index.  That is also what you get back when
an index isn't there.   The only way to tell the difference between an
index that isn't there and one with a value of zero is the \fBin\fP
operator.  We advise in general against using the value zero.
.P
It is possible to use the increment (\fB++\fP) and decrement (\fB-\-\fP)
operators on a database index.  If the index doesn't exist, it gets
created at 0, and then incremented (to 1) or decremented (to -1).
.P
If you ask whether any element of an array exists \fBin\fP a database,
only the access date
on the first entry in the array that was found will be updated.
If you ask whether a string or strings match a database full of patterns,
only the first pattern in the database that matches has its access time
updated.
.P
``First'' by the way, is hard to predict, as the patterns are
tried in an internal database order that has no relation to how the
items were entered into the database, or the order they appear in
database files.
.P
While it's not usually called for, you can delete and free up individual
indices from the database with the \fBdb\_delete\fP procedure.
.Bb
mydb["foo"] = 100;
db\_delete( mydb, "foo" );
.Be

.H 3 "Message IDs"
.P
Most of the examples above were done with message-ids.  Unfortunately,
due to human error and the wide variety of posting software that exists,
not all followup messages contain a \fB: References\fP line with a proper
list of parent articles.
.P
This means that if you filter using message ids, you still may not get
exactly what you want.
.P
The other alternative is to use subjects.  Followups usually have
a \fBSubject:\fP line that consists of the original article with
\fBRe:\fP prepended.   If there is buggy software, there might be
more than one \fBRe:\fP, but this is usually not the case.
.P
This is still not perfect, because may people who do followups generate
new subjects.  This is, in fact, a wise move when the subject of
a stream of articles drifts away from the original topic.  If the
\fBreferences\fP array were always correct, this would not be a problem.
.P
You can store subjects in a database and search for them just like
message-ids.  Just be sure to apply the \fBdrop\_re\fP function to
any subject lines you work with.
.P
You can also store subject lines in a database that you intend to use
as a database of patterns.  This will catch messages where the original
subject is still present, but has been expanded upon.  It's slower, but
will catch more articles.   If you do this, be sure to apply the
\fBliteral\_pattern\fP function as well as the \fBdrop\_re\fP
function before storing the subject in your database of patterns.
.Bb
extern string drop\_re( string );
extern string literal\_pattern( string );
/* to store */
datab[ literal\_pattern( drop\_re( subject ) ) ] = true;
/* to check */
reject if subject has datab;
.Be
.P
Perhaps you will find it best to work with both subjects and
message-ids.



.H 1 "Declarations"
.P
We have already discussed a few types of declarations, such as variables,
external variables and entry point procedures.
.P
To summarize, global declarations can appear anywhere in the program, outside
the bodies of procedures and functions.
.P
Local declarations can appear inside procedures and functions, at the very top.
.P
Variable declarations consist of a type name, the optional \fBarray\fP
classifier and the name of the variable.  External variables can be
imported by preceding the above with \fBextern\fP.
.H 2 "External Functions"
.P
It is also possible, and usually necessary, to make external declarations
for functions and procedures.  All the functions we have described to you
so far have been special ``predeclared'' functions.  You do not have to
write external declarations for predeclared functions, and so you have
gotten along fine so far.
.P
A sample external declaration looks like this:
.Bb
extern string left( string, int );
.Be
.P
You can make an external function declaration at either the global level
or the local level.  Global external declarations last for the rest of
the program.  Local ones exist only inside the routine they're in.
You can't make an external declaration for one of your own functions.
If you have to use one of those before you declare it in the program, you
make a forward declaration.
.P
The general syntax is:
.Bb
extern \fItype\fP \fIfuncname\fP( \fItype, type, ...\fP );
.Be
The first type is the return type.  In the parentheses, you give a list
of types for the arguments.  You must provide a type for each argument.
.P
There are many external functions available in the NewsClip library,
many more in the C library, and you can also define your own.

.H 2 "Local Procedures & Functions"
.P
Procedures and functions are almost identical in form.   Functions return
a value, and the type of value they return must be declared.  Procedures
don't return values, so the word \fBprocedure\fP is placed where you
would put the return type.  Two examples are:
.Bb
procedure
bumpcount()
{
	counter = counter + 5;
}
int
square( int value )
{
	return value * value;
}
.Be
.P
As you can see, procedures and functions can have an optional list of
\fIarguments\fP provided.   In this list, you will name arguments with
a type and an argument name.   Those arguments can be used like variables
within the subroutine.
.P
The details of procedures and functions are complex.  If you are not
already familiar with C programming, we do not advise that you make use
of procedures and functions, other than in defining the simple entry
point procedures shown in the examples.  In this section, we will discuss
only the fine points of procedures and functions, assuming that the reader
has a knowledge of C or some similar programming language.
.P
The other main difference between procedures and functions is the
\fBreturn\fP statement.  In both cases, \fBreturn\fP makes the subroutine
terminate immediately.
Inside a function, the \fBreturn\fP statement must be given an argument,
and that argument must be of a type that matches the declared return type
that came before the function name.
.P
Inside a procedure, the \fBreturn\fP statement must not be given an
argument.  It just triggers termination of the procedure.
.P
The \fBaccept\fP and \fBreject\fP statements may only be used in
procedures.   In fact, they should only be used in procedures that
have been called only within procedures.  While this rule is not
strictly enforced, if your \fBarticle\fP procedure calls a function that
calls a procedure that does a \fBreject\fP, then the score will indeed
be set to a very negative number, but processing will not be
terminated immediately.
.P
You have, of course, already been using procedures, as the main
entry points like \fBarticle\fP are actually procedures with no arguments.
.P
Arguments must be declared, and user subroutines can only take a fixed
number of arguments.   Argument types must match or be compatible when
calling a subroutine.   Automatic type conversion is done, unlike in C.
For example, if you pass a newsgroup variable to a subroutine that takes
a string argument, the newsgroup name string will actually be passed.
.P
Arguments are passed by value only.  This means that subroutines can
use their arguments like variables, but they can't change the value
of an argument variable for the caller, and thus can't pass values back
except via the function return type.  Sadly, procedures which wish to pass
values back are limited to setting global variables.
.P
All procedures and functions must be declared in advance, except for
a special list of predeclared routines that are part of the NewsClip
language.  You may not use an undeclared function and assume it returns
an integer, as you can in C.

.H 2 "Forward Declarations"
.P
If you plan to use a procedure before it is defined in your program
file, you must make a forward declaration.  A forward declaration looks
just like an \fBextern\fP declaration, except the keyword \fBforward\fP
appears in place of \fBextern\fP.
.P
For example, if a function is defined as:
.Bb
string
repeat( int n, string str )
{
	int i;
	extern string concat( string, string );
	string ret;
	ret = "";
	for( i = 0; i < n; i++ )
		ret = concat( ret, str );
	return ret;
}
.Be
Then a forward definition would look like:
.Bb
forward string repeat( int, string );
.Be

.H 2 "Your Own Headers"
.P
While there is a predefined header variable that you can import for
just about every known USENET header, you may still wish to define your
own header variables.  You can, with a special declaration.  In place of
a regular declaration, say:
.Bb
header \fItype\fP \fIvarname\fP : "\fIkeyword\fP";
or
header \fItype\fP array \fIvarname\fP : "\fIkeyword\fP", "\fIdelims\fP";
.Be
.P
The header line prefaced by the \fIkeyword\fP will be processed and
stored into your
variable.   If it's an array variable, the field will be parsed with the
delimiters you specify in the delimiter string.   Any span of the
delimiter characters counts as one delimiter, by the way.  Typical
delimiters are space, tab and comma.
.P
You can use this to define new header lines that appear but aren't supported
by NewsClip yet.  You can also make your own definitions for standard
header lines, so long as you don't try to import the official header
variable.  The only header line you can't make your own definition for
is \fBnewsgroups\fP -- that's always a newsgroup array.
.P
You can, for example, define your own \fBXref:\fP header line, thus
avoiding the automatic processing that comes when you import the
\fBxref\fP variable.
.P
There is a special trick involving the \fIkeyword\fPs.  If your keyword
string starts with a lower case letter, then the field will be mapped
to lower case before being parsed.   If it starts with an upper case
letter, the mapping doesn't take place.   Almost all the predefined header
variables include mapping to lower case, as it's better for string comparisons
and pattern matching.
.P
You can thus use your own header declarations to stop the lower case
mapping done normally in NewsClip.
.Bb
header string subject : "Subject";
.Be
would give you a new subject variable.   While this example shows the
new variable replacing the old, we advise that you give new variables
different names when possible, so as to avoid confusion.
.Bb
header string upsubject : "Subject";
.Be
.P
Another special trick involves the delimiter string.  While you can include
space as a delimiter character, that doesn't allow you to break up things
like multi-word lists split up with commas.   If you put an \fBS\fP at the
front of the delimiter string, the S does not become a delimiter.  Instead
it signals that white space should be removed from the front and end of
element strings.   This is how the \fBKeywords\fP array is parsed, by default,
using a delimiter string of \fB"S;"\fP to do the job.
.P
Header declarations may only appear as global declarations.  You can not
place them inside subroutines.



.H 1 "General Notes"
.P
In this chapter we describe some of the special functions, procedures
and variables available for use in NewsClip programs.  Sometimes the
description here is just an introduction.  You should check the reference
section for more details.

.H 2 "Distribution"
.P
There may be times when you wish to examine an article based on how
far it was intended to be distributed.  This is of use in filtering feeds
to other sites, but it is also useful to readers who want to give priority
to articles posted to a local subnet of a worldwide group.
.P
USENET distribution is normally controlled by the \fBNewsgroups:\fP line, but
distribution can also be restricted by proving a further list of groups
on a \fBDistribution:\fP line.
.P
It is worth noting that while the names used on the \fBDistribution:\fP line
are usually thought of as special distribution keywords, they are actually
special newsgroups.  Thus \fBdistribution\fP is a newsgroup array.  In
fact, you can use regular newsgroup names as distributions if you want.
An article posted to ``comp.misc'' with a distribution of ``rec.humor''
would only go to machines that get both.
.P
Usually on USENET the higher level ``root'' names are used to define
hierarchies of newsgroups as well as distribution.  ``Comp'' is a newsgroup
that nobody posts to, but any machine that is fed ``comp'' will be fed all
groups that are in the ``comp'' hierarchy.
.P
We tell you all this because the USENET distribution mechanism is not
widely understood, but it's important to know about it to filter articles
with it.
.P
If the \fBDistribution:\fP line is missing, then the \fBNewsgroups:\fP line
doubles as the distribution.   As such, we advise you not to use the
\fBdistribution\fP variable, but rather the special \fBrdistribution\fP
variable.  This variable gives the \fBdistribution\fP array if that is
present, and gives \fBnewsgroups\fP otherwise.  As such it is always
defined and correct.
.P
You can check to see if an article has been explicitly limited to a
distribution like ``usa'' by using an expression like
\fB#usa in rdistribution\fP, remembering that ``usa'' is a newsgroup name
constant.   It is more likely you will want to check for anything that
starts with usa, so you might rather use \fBrdistribution has "^usa"\fP,
which uses pattern matching.
.P
Normally, however, you just want to check the distribution level -- is it
local, citywide, statewide, national or international?  In other words,
about how many machines was the article posted to?
.P
Assuming the NewsClip system has been installed properly, every newsgroup
and distribution (remember that distributions are just special newsgroups)
will have an integer \fIdistribution level\fP associated with it.
You can get this level with the function \fBdlevel(group)\fP.
.P
This number is an estimate of the number of machines that get
the given group or distribution.   It is by no means guaranteed to be
anywhere near accurate.  What is important is that the numbers are
ordered, and that wider distributions will have a higher distribution
level number than small ones.   This means that you can do comparisons.
.P
To help in this, we have defined some special fake newsgroups which
exist solely to be used as arguments to \fBdlevel\fP.  They are names
like \fB#local\fP, \fB#organization\fP, \fB#city\fP, \fB#region\fP,
\fB#state\fP, \fB#province\fP, \fB#country\fP, \fB#continent\fP,
\fB#usenet\fP and \fB#world\fP.   They will be defined to be the
right numbers for the distributions at your site that match these
geographic regions.
.P
If you want to see how far a group is going, you can ask something
like:
.Bb
if( dlevel(group) >= dlevel( #state ) )
.Be
and that should tell you if the group is statewide or larger.  Of course
you can also hard code your local statewide distribution if you want
to.

.H 3 "distribution\_level"
.P
The key to all this is a special variable called \fBdistribution\_level\fP.
It is the distribution number for the current article.  It is calculated
by looking at all the groups on the \fBDistribution:\fP and \fBNewsgroups:\fP
lines.
.P
If you want to see all the articles posted for within your city, you can
say:
.Bb
accept if distribution\_level <= dlevel( #city );
.Be
.P
You can also assign points as you like based on distribution, assigning
more or less to the score based on how wide the article was meant to go.
And of course, you can do it on a group by group basis.
.P
This is similar to another trick, which shows you articles posted by
local people.  If you're in the large domain ``foo.edu,'' you can ask to see
all articles posted by locals with
\fBaccept if right(from,2) == "foo.edu";\fP or
\fBaccept if from has "foo.edu$";\fP, whichever you prefer.  Since
all articles restricted to your domain normally come from people within
the domain, this shows you the local articles.  It also shows you the
worldwide articles posted by local people.
.P
(The above only works if ``foo.edu'' is never used as a name on its own.
Use \fBright(domain(from),2)\fP if ``user@foo.edu'' might exist.)

.H 2 "Special Header Variables"
.P
In the previous section, we saw the variable \fBrdistribution\fP which
gives you the ``real distribution.''  There are similar variables for
all the other header items which have defaults.  \fBRreply\_to\fP
and \fBRsender\fP default to \fBfrom\fP if their main header line is
missing.  \fBRfollowup\_to\fP and \fBRdistribution\fP default to
\fBnewsgroups\fP.
.P
You have also seen some calculated variables that depend on header
lines.  These include \fBdistribution\_level\fP, described above,
and \fBfollowup\fP which is true if the article has a
\fBReferences:\fP line.

.H 2 "The Outside World"
.P
A few variables you can import come from places beyond even the
header.  For example, when you invoke the news clipping program from
the shell, you can give options that begin with the string \fIo=\fP.
(You can spell it out as \fIoption=\fP if you like.)  The arguments
of these options are placed in the string array \fBoptions\fP.
.P
Thus if your program is called \fBnclip\fP and it gets called with:
.Bb
nclip o=quick o=+j
.Be
then the variable options will have two values, ``quick'' and ``+j.''
You can test for the presence of one with an expression like
\fB"quick" in options\fP.
.P
Aside from options, you can also examine environment variables with the
string function \fBgetenv\fP.  You might tell users to define an
environment variable ``CLIPOPTS''.  A call to \fBgetenv("CLIPOPTS")\fP
would return the string they provided, or \fBnilstring\fP if the environment
variable were not defined.
.P
There will be times when you want to check for messages from people
at your own site, and particularly for messages from yourself.  While
you can code this explicitly with something like \fBfrom == "me@mysite"\fP,
we have provided two external variables that contain the mail address and
the site/domain of the person running the newsclip program.
.Bb
extern string my\_domain;
extern string my\_mail\_address;
if( domain(from) == my\_domain ) {
	adjust 20;
	if( from == my\_mail\_address )
		adjust 100;
	}
.Be
.P
You will want to use these variables if you are writing newsclip programs
or subroutines for use by other people.
.P
In the above example, you may have noticed the string function \fBdomain\fP,
which returns the string after the first at-sign (@) in the argument it
is given.  When passed a \fBuserid\fP variable, this gives the \fIsite\fP
or \fIdomain\fP part of the address.
.P
Note that there is no way to find out the local machine's full domain
name under program control, so these variables will only be valid if the
person who installed the NewsClip system set them up properly.
.P
Another useful system value is the \fBdatetime\fP value \fBtime\_now\fP,
which gives the current time, or at least the time when your news clipping
program started running.   You have already seen this used in the
\fBwrite\_database\fP examples.

.H 2 "Nil Variables"
.P
Header variables that are not defined for an article, along with
uninitialized global variables all have what are known as
``nil'' values.  (Uninitialized local variables have undefined values --
you can't be sure at all what they are.)
.P
Before using a header variable that might not have been set for an
article, you should always check first to see if it is nil.  If you
try any operators or nil arrays, strings or userids, you may crash your
program -- particularly if you assign into an index of a nil or undefined
array variable.
.P
Remember, all header variables are set to 0 or nil with each new article.
.P
Unlike in C, you must explicitly compare your values with special nil
constants.  You can't just use an array variable alone in an if, as in:
.Bb
if( arrvar )
	whatever;
.Be
.P
Instead use:
.Bb
if( arrvar != nilarray )
	whatever;
.Be
There are similar values known as \fBnilstring\fP, \fBniluserid\fP and
\fBnilnewsgroup\fP.   Nil integers and dates are set to zero.
.P
There is also a special value called \fBnildatabase\fP.  If you try to
index into a nil database to read values, you will always get 0, as though
the index were not found.  Any attempt to store into a nil database or
free one will generate an error.

.H 2 "Global Control & Subscription"
.P
Right now, newsreading programs usually handle the details of what newsgroups
a person subscribes or does not subscribe to.  In general, this means that
the NewsClip program's only job is to filter articles in subscribed
newsgroups.
.P
This doesn't have to be so, of course.  For example, if your \fBarticle\fP
procedure were to include a line like:
.Bb
reject if is comp.misc;
.Be
that would effectively ``unsubscribe'' you to that group.   It would
also reject all crosspostings from that group into groups you read.
.P
We have provided a more efficient way of doing this, by providing some
variables you can set in the \fBstartgroup\fP procedure of your clipping
program.
.P
These variables are called \fBreject\_all\fP and \fBaccept\_all\fP.  If
you set \fBreject\_all\fP in your \fBstartgroup\fP procedure, that is the
same as unsubscribing to a group.  All articles will be rejected until
we are finished processing that group.
.P
This is more efficient than including the sample \fBreject if\fP statement
above, for if \fBreject\_all\fP is set, then the articles don't even get
looked at.  They are rejected out of hand.    These two special variables
are reset whenever a clipping program finishes with a newsgroup, so they
only apply to one newsgroup at a time.
.P
The variable \fBaccept\_all\fP does the reverse.  All articles in the
group are accepted, without even being examined.  This effectively turns
off the use of your \fBarticle\fP filtering procedure for the duration of
that group.
.P
You could set up to unsubscribe to groups with code like:
.Bb
procedure
startgroup() {
	extern newsgroup main\_newsgroup;
	extern int reject\_all;
	switch( main\_newsgroup ) {
		case #comp.misc:
		case #news.software.b:
		case #talk.politics.misc:
			reject\_all = true;
		}
			
}
.Be
.P
Of course, you could also do the reverse of this, by doing nothing for
the groups named in the \fBcase\fP statements, and turning on
\fBreject\_all\fP by default.
.P
The use of these variables can be important in the \fIpipe\fP mode, where
your program talks to a newsreader.  By setting one of these two special
variables, you tell that newsreader that it doesn't have to even talk to
the news filter program for the duration of a newsgroup.
.P
These variables should not be used at all in \fIfilter\fP mode, because in that
mode there is no such thing as a ``current newsgroup.''

.H 3 "Named Groups"
.P
One good idea is to set \fBaccept\_all\fP for all groups that aren't
actually named in your program.  If the group isn't named in your
program, you probably don't care to scan it.  There's a handy function
to do this for you.
.Bb
extern newsgroup main\_newsgroup;
extern int accept\_all;
extern int named\_group( newsgroup );
if( ! named\_group( newsgroup ) )
	accept\_all = true;
.Be
A ``named group'' is one that you named as a newsgroup constant
(\fB#rec.humor\fP) or inside an \fBis\fP operator.
.P
If you deal with groups in general, for example by checking for
the ``comp'' prefix on the front of the group name, such groups will
not be considered named groups.  You will have to add checks for such
groups to ensure you don't set \fBaccept\_all\fP on them.
.P
As you might guess, \fBnamed\_group\fP returns true if the group is
a named one, and false otherwise.
.P
This is a very useful thing to do when working with a newsreader in
pipe mode, as it will know to not bother sending you messages in groups
you don't wish to filter.

.H 2 "String Tools"
.P
To help you manipulate the strings that show up in news article
headers, a number of special string functions are available.  Most
of these have to be imported as externals.
.P
It is important to note that string values are allocated in what we call
a ``temporary'' pool of memory.  That means that the memory for all normal
strings is erased and re-used with each new article.  If you assign a
string variable during the processing of one article, you can't expect
it to be around for the next one.
.P
This is usually only a problem for code in sections outside the
\fBarticle\fP procedure.   Strings created in the \fBstartgroup\fP procedure
still go away with the first article.   If that isn't what you want, you
can create permanent strings with the \fBpermstring\fP function.  Just
say \fBstr = permstring( tempstr )\fP to make a string permanent.  The
two variables can even be the same.
.P
Don't make too many strings permanent, or you will run out of memory on
some machines.

.H 3 "Dots & Domains"
.P
Many key strings on USENET are delimited with dots.  In
particular, newsgroup names and domain names are split into parts this
way.
.P
We have provided two functions to take out substrings from such names.
The \fBleft\fP and \fBright\fP functions give you left or right parts
of such strings, delimited with dots.   You provide a string and the
number of parts you want.
.P
For example, \fBleft( "comp.sys.ibm.pc", 1 )\fP is ``comp'' and
\fBleft( "comp.sys.ibm.pc", 2 )\fP is ``comp.sys,'' as you might
expect.   \fBRight( "me.my.cs.edu", 1 )\fP is ``edu,'' the top level
domain.
.P
You can check for left and right parts with pattern matching, but often
the use of \fBleft\fP and \fBright\fP is more efficient.  If you ask for
more parts than there are, you get the whole string.
.P
Quite often when you get an email address from a field like \fBfrom\fP,
you wish to examine just the ``site name'' or ``domain'' part of it.
The \fBdomain\fP function does this for you.  For example,
\fBdomain("user@foo.bar.com")\fP is ``foo.bar.com.''

.H 3 "Length and Indexing"
.P
You can get the length of a string with \fBstrlen\fP.  It returns an
integer.   You can index any character in a string, from character 0,
to the character at position \fBstrlen(string)-1\fP with the
\fBchindex\fP function.
.P
\fBChindex("ABCD", 2)\fP is the integer 67, which is the ascii code
for the letter ``C.''   You can express character constants in single
quotes if you wish to do comparison.  You will find expressions like
\fBchindex(mystr, 3) == 'S'\fP can do the trick for you.
.P
You can't assign characters into a string.  You can just read them out.
.H 3 "Subjects"
.P
Most USENET subject lines are from followups, and as such they have
one or more instances of ``Re:'' on the front.  The function
\fBdrop\_re\fP takes a subject string, and returns the string without
any spaces or ``Re:'' prefixes on the front.  In many programs, this is
the string you want to examine, so you will see things like:
.Bb
extern string subject
{procgap}
string realsubject;
realsubject = drop\_re( subject );
.Be
at the front of \fBarticle\fP procedure, all subsequent tests are
done on the variable \fBrealsubject\fP.
.H 3 "Concat"
.P
You can concatenate two strings together with the \fBconcat\fP function.
If you want to concatenate more than two strings, just call \fBconcat\fP
several times.  For example,
.Bb
allthree = concat(  concat( s1, s2 ), s3 );
.Be
works fine to concatenate three strings.
.P
You can use this to build filenames or database keys or whatever you
like.

.H 2 "Cross Posting"
.P
Many articles are cross posted to several groups.  In our typical
program, we did a loop so that our group-specific filtering
routines were performed for each group in the list.   You may or may
not always wish to do this.
.P
If an article is cross posted to several groups you read, you usually
don't want to see it twice.  To arrange this, simply put in an import
declaration for the \fBxref\fP variable.
.Bb
extern string array xref;
.Be
.P
If you do this, and you're running in the \fInewsrc\fP mode, then when
a cross-posted article is rejected, it will get marked read in all the
subscribed groups in which it is found.
.P
This is normally what you want, but it is also not the default.  If,
as suggested in the chapter on a typical program, you
do a loop that checks all groups an article is posted to, then if an
article is rejected in one group, it will be rejected in all the others
as well, as it runs through the same code.  NewsClip is fast enough
that this may be all you need to do the job.
.P
If you don't run the articles through the same code, you may get articles
accepted in one group and rejected in another.   Sometimes this can
be what you want.   Remember that many modern newsreaders make sure that
they don't show you the same article more than once.  By having individual
control over what groups an article is accepted in, you can control what
group you read articles in.  With most newsreaders, you will see the
article in the first group in your reading order.
.P
If you are producing a file list (ie. not running in \fInewsrc\fP mode), it
is fairly important that the same article not be listed twice in two
different places.  We advise that you import the \fBxref\fP variable in
this case.   When you do, even articles that are accepted get marked off
as read in the other groups, so that nothing gets listed twice.
.P
There are other ways to eliminate crosspostings, of course.  One suggestion
on this is described in the database chapter.   If your system does not
support the \fBXref:\fP header line, you will need to code up such
a system, and it may even be more efficient.

.H 2 "Entry Points"
.P
You are now very familiar with the \fBarticle\fP entry point procedure
that is called with each new article.  Here is a complete list of
the entry point procedures.
.P
Remember that most of these entry points are \fIoptional\fP.  That means
if you create one, but spell its name wrong, you will get a procedure
that never gets called, and a null procedure where you wanted a real one.
.H 3 "Init"
.P
Code that is run when the newsclip session first begins.  This
allows the initialization of global variables.   You will want to
load up your global databases and set any of the special control
variables here, too.
.H 3 "Startgroup"
.P
Code that is run whenever a new newsgroup is processed.  Here is where
you can test for and load newsgroup related databases.  This procedure
is given an argument, which is an estimate of the number of unread articles
in the group.  The estimate may not be very accurate, and will usually only
be good in \fInewsrc\fP mode.
.H 3 "Endgroup"
.P
Code that is run when we're done with a newsgroup.  Here is where you can
write out your newsgroup related databases and free them.
.H 3 "Article"
.P
Code that is run with each article.  This code decides whether
to accept or reject the article.  The default is to accept.
.H 3 "Post_article"
.P
Code that is run after the \fBarticle\fP routine terminates.
Such code can examine the score and take special further action.
This procedure is given an integer argument, namely the score.  (This
is also available in the global \fBscore\fP variable.)
.H 3 "Terminate"
.P
Code that is run when the newsclip session is done.  In this
section, one usually writes out any updated information to disk.
.H 3 "Command"
.P
Code that is run when a kill command comes down the pipe in \fIpipe\fP
mode.  This procedure gets a string argument, which is the command
string.  It is expected to either \fBreject\fP the command, or process
it an issue an \fBaccept\fP.

.H 2 "Advanced Statements"
.P
There are a few statements we haven't covered at this point.  They are
meant for advanced users.  Some come from C, others are new to NewsClip.
Full descriptions can be found in the reference section.
.H 3 "Control Flow"
.P
From C, we have included 4 basic control flow statements.  \fBbreak\fP
jumps out of an enclosing loop or \fBswitch\fP statement.  \fBContinue\fP
jumps immediately to the top of any current \fBfor\fP or \fBwhile\fP
loop, so that the next iteration can be executed.
.P
Finally, there is the general \fBgoto\fP statement, which jumps to a label
within a subroutine.
You can label any statement by placing an identifier and a colon in front
of it.
.P
The \fBreturn\fP statement terminates a subroutine.  Inside a function, it
must be given an argument that is the function return value.
In procedures it must not be given an argument.  The \fBaccept\fP and
\fBreject\fP statements are variants of \fBreturn\fP which set the
\fBscore\fP.
.H 3 "Dynamic Arrays"
.P
Advanced users can create arrays of any size they choose.  The array is
allocated in temporary memory.  Try:
.Bb
\fIarrayvar\fP = array \fIexpr\fP;
.Be
where the \fIexpr\fP should be an integer expression giving the desired
size of the array.  The array will index from 0 to \fIexpr\fP minus 1.
For example:
.Bb
string array myar;
myar = array 20;
.Be
creates a string array with 20 elements, indexed from 0 to 19.
.H 3 "String Parsing & Type Conversion"
.P
You can also access the same parsing routines that turn header fields
into header variables.   For scalar (non array) variables, try:
.Bb
parse \fIvar\fP = \fIstring-expr\fP;
.Be
This converts the string into the right type and stores it in the variable.
You can convert newsgroup names to newsgroup numbers, or dates to date
values this way.  It is also a handy way of converting a string to an
integer.
.P
The array form is:
.Bb
parse \fIarrvar\fP = \fIstring-expr\fP, \fIdelims\fP;
.Be
The second string expression defines the delimiters that will be used
to parse the array.  The same rules apply here as for the \fBheader\fP
declaration described elsewhere.  For example:
.Bb
newsgroup array myn;
parse myn = "rec.autos,news.groups", " ,";
.Be



.H 1 "Compiling & Operation"
.P
A brief introduction to compiling and running your NewsClip filtering
programs was given in chapter 2.   We will now explore this area
in more detail.

.H 2 "Compiling"
.P
The \fBncc\fP compiler compiles your programs by translating them into
C programs, compiling these with your C compiler, and linking the result
with the NewsClip library.
.P
The translation is fairly simple as compilations go, other than providing
for special conversions for NewsClip's data types.  It is the library
that does most of the work, and thus makes it easy to write a
NewsClip program.
.P
When you compile with
.Bb
ncc myprog.nc
.Be
everything is done in one step.  The source is placed in \fBmyprog.c\fP,
that is compiled, including a special file of definitions (usually found
in \fB/usr/lib/news/newsclip/ucode.h\fP, and this is linked with the library,
usually found in \fB/usr/lib/news/newsclip/cliplib.a\fP.   The C program
source is left around for you to examine.   The executable program,
ready to run, is placed in the file \fBnclip\fP in your current
directory.
.P
You can alter this a bit if you like.  For example, you can skip the
C compile and link stage with the \fI-link\fP option, allowing you to
examine the resulting C program and compile it on your own.   Options
are described later.

.H 3 "Preprocessor"
.P
The \fBncc\fP compiler passes your input program through the
``C preprocessor.''  This is the same macro language and conditional
compilation facility that C uses.  CPP \fIdirectives\fP are all keyed by lines
that begin with a ``#'' character.   These include the
\fB#include "filename"\fP directive, which causes the contents of the named
file to be inserted into the compilation stream.
.P
If you have a lot of little filtering routines for each newsgroup that
you put in individual files, you can get them all combined together
when you compile with \fB#include\fP directives.  Your big \fBswitch\fP
statement might look like:
.Bb
	for( n in newsgroups ) switch( n ) {
#include "news/admin/kill.nc"
#include "news/groups/kill.nc"
#include "sci/physics/kill.nc"
#include "comp/sys/ibm/pc/kill.nc"
#include "rec/humor/kill.nc"
#include "rec/humor/funny/kill.nc"
		}
.Be
.P
You could then edit each file individually, as desired.
.P
Other directives include \fB#define\fP, which defines manifests constants
and macros, and \fB#ifdef\fP/\fB#else\fP/\fB#endif\fP which allow
conditional compilation based on whether or not a symbol has been defined
with \fB#define\fP or a command line options.
.P
A full exploration of CPP is beyond the scope of this manual.  See
documentation on the C language, as well as the ``man'' entry for
CPP in your own system's documentation.
.H 3 "Options"
.P
You can control the compiling process to some degree by providing options
to the compiler.
.P
The compiler's primary argument is the sole input source file, which by
convention should end with the ``.nc'' (for NewsClip) extension.
.P
Untagged arguments with an extension of ``.c,'' ``.o'' or ``.a'' will not be
treated as NewsClip source programs, but rather as C source code, system
object code or library files.
They will be passed directly to the C compiler to be linked in with your
program.
.P
The other options use LGS's own option style, which is a variant of the
conventional Unix option style.   Binary (on/off) options are preceded by
a plus ``\fB+\fP'' or minus ``\fB-\fP,'' where plus means the option
is turned on, and minus means the option is turned off.  You can type
a whole option name after the ``+/-,'' or just enough to uniquely
distinguish the option -- usually just a single letter.   Thus
\fI-link\fP works as well as \fI-l\fP.
.P
Valued options are written with a keyword (or perhaps the single letter
abbreviation of the keyword), an equals sign ``\fB=\fP'' and a string
value.   For example, \fIo=myclip\fP.

.H 4 "-link"
.P
The \fI-link\fP option disables the C compile and link phase of compiling.
No executable program will be produced.  A C program with the same name
as your source file (but with an extension of ``.c'') will be produced,
assuming there are no errors.

.H 4 "output=pathname"
.P
This option specifies a name for the executable news
filtering program.   The default is \fBnclip\fP.

.H 4 "Define=defstring"
.P
This option specifies a preprocessor definition to be
passed along to the C preprocessor.  For example, \fID=bsd\fP would
cause the manifest symbol ``bsd'' to be defined in \fB#ifdef\fP tests.
You can specify several of these.

.H 4 "Include=dirpathname"
.P
This specifies a directory that the preprocessor
should search for files included with the \fB#include\fP directive.
You can specify several of these.

.H 4 "intermediate=file.c"
.P
This allows you to specify an alternate
intermediate name for the generated C program.  Normally this name will
be derived from the name of the source file.  The provided name must end
with ``.c.''

.H 4 "ccoption=option"
.P
This lets you specify a string that
is to be passed directly along to the C compiler for the compile and
link phase.  You can pass any special local options your C compiler
needs.

.H 4 "-externals"
.P
The \fI-externals\fP option disables the ability of users to make
external import declarations of symbols other than those in the
approved list of the NewsClip language.  This limits the language
to the definition in this manual.
.P
This is only a very mild security feature, and any capable malicious
programmer could get around it fairly easily.  If you are going to
allow remote sites to submit newsclip feeding programs to you, it is
important that you create independent system userids for these programs,
and run them with the real and effective userid properly set.  Do
not use the ``uucp'' or any other system userid.
Depend on operating system tools for all your security, not this option.

.H 3 "Single-User"
.P
If you only have a single user copy of NewsClip, and, because you
are not a system administrator, you have been unable to install
NewsClip files in system directories, then the files \fBcliblib.a\fP
and \fBucode.h\fP must be in your current directory when you compile.

.H 2 "Externals"
.P
So long as the \fI-external\fP compiling option is not used, NewsClip
programs may make external declarations for arbitrary C routines.  This
includes routines from the standard C library, or routines from
special C source or object code modules provided on the \fBncc\fP
command line.
.P
For users willing to write their own C code, the potential here is
truly unlimited.   The NewsClip language has been designed to be
simple and special purpose.  There are some less common things that
are simply not easy to do within it.  External functions can do all
this for you.
.P
Even if you have source code to the NewsClip compiler, we advise you
to do any special tricks with your own C code, rather than by changing
the compiler to extend the language.   Neither route is officially
supported, but the former is preferred.
.P
Important note: Since the case of letters in NewsClip doesn't matter,
all C externals must be entirely in lower case.  If you want to call
an existing routine that has upper case letters in its name, you will
have to write a small interface routine to do the calling.  With variables
that have upper case names, you will be out of luck.

.H 2 "Filtering"
.P
Once you have compiled your program, there are several ways you can
run it to filter news articles.   We'll assume your program is in
\fBnclip\fP for now.   First of all, \fBnclip\fP has a number of
command line options you can use to control its operation.
.P
Most important are the ``modes'' of operation, specified with the
\fImode=\fP option.  Essentially, you have written a subroutine which,
when passed an article, decides whether to accept or reject that article.
The control portion of the \fBnclip\fP program sets up how the articles
will be gathered and submitted to your procedure, and what will be done
with the results.
.P
You are already familiar with \fInewsrc\fP mode, which you get by
using the \fImode=newsrc\fP option.  We will explain it in more
detail here.

.H 3 "Newsrc Mode (mode=newsrc)"
.P
In \fInewsrc\fP mode, the \fBnclip\fP program processes a standard
format \fB.newsrc\fP file.  Most newsreaders keep track of what the
user has read with a file of this name in the home directory.  The
RN newsreader also keeps other files in the same directory as this file.
.P
In \fInewsrc\fP mode, \fBnclip\fP also keeps a file
called \fB.newsrclas\fP to keep track of the last article that has been
processed by the \fBnclip\fP program in each desired newsgroup.  This
is necessary because it's not possible to tell where to start processing
just from the \fB.newsrc\fP file and the news \fBactive\fP file.
.P
When run in \fInewsrc\fP mode, \fBnclip\fP examines the \fB.newsrc\fP
file, \fB.newsrclas\fP file and the USENET active file
(usually \fB/usr/lib/news/active\fP).  From these it calculates the
range of unread articles that must be processed.
.P
First it calls your \fBinit\fP procedure.
.P
It then loops through the subscribed newsgroups in the \fB.newsrc\fP
file.  As it starts each group, it calls your \fBstartgroup\fP procedure.
It then goes through all the appropriate articles, and calls your
\fBarticle\fP procedure on each one.   Each rejected article is marked
as read.  When the group is done, the \fBendgroup\fP procedure is called.
.P
When all is done, the \fBterminate\fP procedure is called, and the
\fB.newsrc\fP file is written out, with all the rejected articles marked
as read.   The \fB.newsrclas\fP file is written out with all articles
marked as processed.   (This way, if you call \fBnclip\fP again immediately,
it will do nothing unless new articles have arrived on your machine.)
.P
Some options and environment variables affect this procedure.  See below.
.H 3 "Filter Mode (mode=filter)"
.P
This mode works quite differently, and does not even involve the
\fB.newsrc\fP or \fBactive\fP files.  Instead, it expects a list of
filenames to appear on the standard input.  Each file should be a
USENET article file.   Each such article will be passed to your
\fBarticle\fP procedure.   If the article is accepted, its filename
will be written to the standard output.  If the article is rejected,
nothing is written.
.P
The result is a filtered list of accepted filenames.   This is ideal
for controlling a batched feed to another site.  Many news systems run
by having the news processing programs output a list of article files
to a special file.  Periodic programs examine this file and batch together
the articles found in it.
.P
Simply modify your batching procedure to have the file processed by
.Bb
nclip <batchfile
.Be
and feed the output list into your batcher.  Beware that it might be
empty!
.P
Note that the entry point procedures \fBstartgroup\fP and \fBendgroup\fP
will not be called in this mode, as there is no definition of when a
group starts and when a group ends.
.H 3 "Batch Mode (mode=batch)"
.P
This mode is an alternative to \fIfilter\fP mode for generating a list
of accepted article files.  Instead of taking input from a file list, it
takes it from a \fB.newsrc\fP and \fBactive\fP file, just like
\fInewsrc\fP mode.
.P
The accepted files have their filenames printed to the standard output.
The \fB.newsrc\fP file is updated to mark \fBall\fP the articles as
read, whether they were accepted or rejected.  This makes the counts in
the \fB.newsrclas\fP file somewhat redundant, but they are still used,
as it makes the process more efficient.
.P
This way, you can maintain a feed through a \fB.newsrc\fP file, and
have no entry in the news \fBsys\fP site subscription file.  You get
control on a newsgroup by newsgroup basis, and of course the full
filtering ability of \fBnewsclip\fP.   The only thing that's not automatic
is the automatic adding of new newsgroups in subscribed hierarchies.
To do this, you must process \fBControl\fP messages in the {mono control}
newsgroup, and use the \fBsubscribe\fP procedure to add them to the
\fB.newsrc\fP.  The sample program \fBfeed.nc\fP shows how to do this.
.P
In this mode, the \fBstartgroup\fP and \fBendgroup\fP entry points
are used.
.P
When using \fIbatch\fP mode, it is advisable to use the \fInewsrc=\fP
option to explicitly specify the location of the \fB.newsrc\fP file.
.P
In \fIbatch\fP mode, it is strongly suggested that your programs import
the \fBxref\fP variable.  (You don't need to do anything with it, just
extern it.)  This will assure that cross posted articles are not
examined or accepted twice.  If your news system does not support
the \fBXref:\fP line, then you must use another scheme to avoid
duplicating crossposts.   See the sample feed program for details.

.H 3 "List Mode (mode=list)"
.P
This mode reads from a \fB.newsrc\fP and \fB.newsrclas\fP file,
and outputs a list of accepted article filenames, just like \fIbatch\fP
mode.   It does not, however, update the \fB.newsrc\fP file, so if you
run it multiple times, you will get the same list, or possibly an
extended one if new articles have arrived.
.P
All the same warnings that apply to batch mode apply here.

.H 3 "Pipe Mode (mode=pipe)"
.P
In this mode, \fBnclip\fP expects to enter a dialogue with the program
that called it, which is assumed to be a newsreader.   The program
takes commands on the standard input, assumed to be a pipe from the
newsreader, and gives back answers on the standard output, assumed to
be a pipe back to the newsreader.
.P
In this case, you don't actually run your \fBnclip\fP program.  Your
newsreader calls it for you and does all the talking to it that's
required.   We have adapted many newsreaders to work in this way, including
the popular RN newsreader.
.P
In general, the commands ask \fBnclip\fP to examine articles, and
the answers accept or reject the articles.  A typical newsreader would
filter all articles through the concurrent \fBnclip\fP process before
presenting them to the user.
.P
The actual command structure is beyond the scope of this chapter.  It
is documented in a special manual available free from Looking Glass
Software Limited.
.P
There are two things to be aware of here.  When a newsreader starts a
new newsgroup, it may query the filter program about the group in general.
This will cause a call to \fBstartgroup\fP.   If you set the
special \fBaccept\_all\fP or \fBreject\_all\fP flags, this will be
communicated to the newsreader, which can then decide not
to filter more articles in that newsgroup.
.P
If your newsreader is the type that likes to do all its filtering right
at the start of a group, you will soon discover that you don't
want to filter all groups like this.
.P
The communication protocol also has a facility so that the newsreader
(or perhaps the user) can issue ``kill'' commands to the news filter.
Such commands would be intended to tell the filter to store strings like
message-ids and users in its databases.   The interpretation of these
commands is up to you.
.P
When such a command comes, the entry point \fBcommand\fP will be called,
with the command string as a single string argument.  You should check
and process the command.  If it is a valid command, terminate by
issuing an \fBaccept\fP statement.  If it is an invalid command, terminate
by issuing a \fBreject\fP statement.  The default is to reject.  If you
don't define a \fBcommand\fP procedure, all commands will be rejected.
.P
More information on \fIpipe\fP mode may be included in the documentation
for readers that support it.  The interface is general, so that any kind
of news filtering program can be adapted to it -- not just those
compiled with the NewsClip system.

.H 2 "Using It"

.H 3 "Pipe Mode"
.P
The ideal mode of operation for NewsClip programs is a smart newsreader
that can talk to the program in \fIpipe\fP mode.  To do this, compile
your program as \fBnclip\fP (that's the default) and place it either
in the same directory as your \fB.newsrc\fP file, or in one of the
directories named in your \fBPATH\fP environment variable.   In
most cases, your home directory is the place.
.P
Then run your newsreader.  It should start up your \fBnclip\fP program
and talk to it for you.  There will be nothing for you to do.
.P
If your smart newsreader uses the standard \fB.newsrc\fP file, then you
can still run your program in \fInewsrc\fP mode as described below.  You
may find this is a handy way to save time.  Run the program on your
\fB.newsrc\fP at night or in the background.  This will scan articles
and update your \fB.newsrc\fP so that it's already done when you
start reading.
.P
This is particularly useful with large groups that you reject almost
all the articles of.

.H 3 "Newsrc Mode"
.P
If a smart newsreader is not available, or even if one is, you can
use your filter program with any newsreader that understands the
\fB.newsrc\fP file in \fInewsrc\fP mode.
.P
Set up your filter program and test it.  Then arrange to run it
regularly in the background with:
.Bb
nclip mode=newsrc
.Be
.P
It will check all the new articles, and get rid of the ones you don't
want.   Run this at night from your \fBcron\fP if possible.  Start
it up in the background from your \fB.login\fP or \fB.profile\fP
script when you log in to your system, and just wait a short time before
you start reading news.
.P
If new articles arrive during your newsreading session, your newsreader
will show them to you, of course, as they have not been filtered.  There
is little way around this.  If you complete a newsreading session, rather
than going around for a second session immediately, you should quit,
run your filter program again, and go back into the newsreader.   This
should help you avoid articles that should be rejected.  You will still
see the odd one, but that should not be a big deal.
.P
If you want to get fancy, you could leave groups unsubscribed, and use
the \fI+unsubscribed\fP option (see below) to only show those groups
to you after processing.   Unfortunately, you would need to unsubscribe to
all the groups at the end of your session, and there is no mechanism to
do this.
.P
One idea is to set up a ``las'' file with the names of your unsubscribed
groups, and then set up a special NewsClip program to search through them.
Run this program at night every few days with \fI+only\fP, \fI+unsubscribed\fP
and the \fIlas=\fP option.  As you only run this irregularly, you can do
things like full text searches for important keywords.
.P
The \fIpipe\fP mode system has been designed to be added simply to most
newsreaders.  Patches exist or are under development for many of the
popular newsreaders.

.H 3 "Options & Environment"
.P
Two options and environment variables let you specify where the
\fB.newsrc\fP and related files will reside in the modes that deal
with a \fB.newsrc\fP file.   (The options supersede the environment
variables.)

.H 4 "directory=dirpath"
.P
This option and the \fBDOTDIR\fP environment
variable let you specify the directory to look for the \fB.newsrc\fP,
\fB.newsrclas\fP and \fB.rnlock\fP files.  (The \fB.rnlock\fP file
is RN's way of ensuring two programs don't go at the \fB.newsrc\fP at once.)

.H 4 "newsrc=pathname"
.P
This option specifies an exact location for the
\fB.newsrc\fP
file.  The name of the \fB.newsrclas\fP file is generated by appending
``las'' to that name, so you should ensure there is enough room in the
filename to do this.  The \fB.rnlock\fP file is not used.

.H 4 "las=pathname"
.P
This lets you explicitly set the name of the last
article seen file.  This is handy with the \fI+only\fP option.
.P
Use this option when testing, when in \fIbatch\fP mode, or when dealing
with a file generated by the \fBmknewsrc\fP program.

.H 4 "option=string"
.P
This lets you specify options that will be
passed down to the NewsClip program.  The option strings are placed
in the global string array named \fBoptions\fP.  The user can import
this array and search for items in it.
.H 4 "+only"
.P
The \fI+only\fP option specifies that only those groups already named
in the \fB.newsrclas\fP file should be processed and filtered.  (This
only applies in the \fInewsrc\fP, \fIlist\fP and \fIbatch\fP modes.)
.P
Normally if the \fB.newsrclas\fP file is missing, or if subscribed
groups are not found within it, they are added with a default last
article seen of zero.   With the \fI+only\fP option, no new groups
will be added.
.P
This way, you can confine NewsClip processing to just a specific list of
groups.  You can also do this internally with the \fBaccept\_all\fP
and \fBreject\_all\fP variables.

.H 4 "+unsubscribed"
.P
This option causes the program to process even the
unsubscribed groups found in the \fB.newsrc\fP file.  If any article
in an unsubscribed group is accepted -- this is assumed to be a rare case --
then the group will be resubscribed so that you see it in your next
newsreading session.
.H 4 "warning=level"
.P
Sets a warning level, currently from 0 to 4.  The default is 1.  The
higher the level, the more warnings you will get.   Warnings are printed
to the standard error output.
.H 4 "Spooldir=dirpath"
.P
Specify an alternate news spool directory.  This is for use by users
with a binary-only copy of NewsClip that use a machine with a non-standard
spool directory.
.H 4 "Libdir=dirpath"
.P
Specify an alternate news library directory.  This is for use by users
with a binary-only copy of NewsClip that use a machine with a
non-standard library directory.

.H 2 "Making .newsrc files"
.P
Compiling NewsClip programs is not difficult, and it's quite fast, assuming
your system's C compiler is of reasonable speed.  This means that
NewsClip can make an ideal language for special purpose scans of the
news spool directories.
.P
To help in this, we have created a special program called \fBmknewsrc\fP.
It can make up a sample \fB.newsrc\fP file for you to use as input to
\fBnclip\fP in \fInewsrc\fP mode, named using the \fInewsrc=\fP option.
.P
The \fBmknewsrc\fP program makes, by default, a \fB.newsrc\fP that
shows every article in every group on the system as unread.  This lets
you scan all the news spools with your filter program.
.P
In the end, you will get a modified \fB.newsrc\fP with only the desired
articles marked unread.  You can then point your newsreader at this new
\fB.newsrc\fP, perhaps with the \fBNEWSRC\fP or \fBDOTDIR\fP
environment variables that \fBnclip\fP also uses.  You will then get
a newsreading session of just the desired articles.   (Be sure to reset
your environment variables afterwards!)
.P
\fBmknewsrc\fP outputs the \fB.newsrc\fP style file on the standard
output.  You should redirect that where you want it.
.P
There are some useful options to \fBmknewsrc\fP to help you cut down
your search.  You will find them necessary, as a full search of a
large system's complete USENET spools can take
scores of minutes or even hours of disk I/O time.
.P
First of all, you may provide regular expression patterns as command
line arguments.  If you do, you will only be provided with newsgroups
that match those patterns.  For example, \fI^comp\\..*\fP would give
you all the groups in the ``comp'' hierarchy.  (Be warned that just
as with \fBgrep\fP, you will have to escape certain special characters
to save them from shell processing.)
.P
You can also ask to scan only the most recent articles in each selected
group.  With the \fIpercent=%age\fP option, you can specify a number
from 1 to 100 that tells what percentage of the available articles should
be marked unread.  (You always get the most recent set.)
.P
The \fI+newsrc\fP option arranges so that you only see groups that
are marked as subscribed in your own \fB.newsrc\fP.  The same rules and
environment variables that \fBnclip\fP uses to find your \fB.newsrc\fP
apply here.
.P
The \fInewsrc=filename\fP option implies \fI+newsrc\fP, and specifies where
the \fB.newsrc\fP is to be found.  The output is still written to
the standard output, however.
.P
Here are some typical steps, with an example following:
.AL

.LI
Write a newsclip program and compile it with \fBncc\fP.  You may want
to put the executable in a different place, for example \fBsrch\fP.
.LI
Build a temporary \fB.newsrc\fP file for half the articles in the
``comp'' groups.
.LI
Filter for the articles you like.
.LI
Read the news. Then reset \fBDOTDIR\fP if you set it.
.LE
.Bb
ncc srch.nc o=srch
mknewsrc p=50 '^comp..*' >/tmp/me/.newsrc
srch m=n n=/tmp/me/.newsrc
setenv DOTDIR /tmp/me
rn
setenv DOTDIR $HOME
.Be




.H 1 "Tips and Traps"
.P
In this chapter, we remind you of some important things to remember when
coding your NewsClip programs.  In particular, important differences from
C are pointed out.

.H 2 "Memory"
.P
Don't create any loops that keep allocating strings -- for example with
\fBconcat\fP.   Temporary memory is just allocated in a big stack, and
it is never freed up until an article is done.  A loop could easily make
you run out of memory, aborting your session.
.P
Naturally, be equally careful of permanent memory that you allocate in
databases and permanent strings.  Be sure to free all databases that
you are not using.  (This is not necessary within the \fBterminate\fP
procedure.)
.P
Remember, when you read a database in from a file, you still get a database
that uses some memory, even if the file is missing or empty.

.H 2 "Strings"
.P
Make sure all your search strings are in lower case letters, unless you
know you are searching a text field or string that has not been converted
to lower case.  Normally almost all such things are pre-converted to
lower case, so if you put upper case in your patterns or test strings
you will not get a match.

.H 2 "Integers"
.P
If your machine only supports 16 bit integers, you can only place values
from -32768 to 32767 in your integers.  It is very easy to overflow.
In fact, in some newsgroups, the article numbers may already overflow
your integers.
.P
One place to watch out is the running \fIscore\fP that you modify with
the \fBadjust\fP statement.   If you adjust the score beyond the range
of an integer, it could wrap around, causing exactly the wrong result.
.P
Make sure your adjustments are appropriate, and not so large that they
might overflow if they all go the same way.  If you are worried that
you might reach overflow at a given point, import the \fBscore\fP variable
and put the following statement in at various points in your procedure.
.Bb
if( score > 25000 || score < -25000 )
	return;
.Be
This will stop the process if the score gets ridiculously high or low.
.P
Some of the functions returning large things like
article sizes will compensate for small integers by returning the
largest integer (ie. 32767) when the actual result is out of bounds.
You may wish to watch for this if you were counting on an exact result.
.P
Date/time variables will always be able to hold more than a 16 bit integer,
but their use as anything but date values is discouraged.

.H 2 "Nil Headers"
.P
If you use any array, userid or string header variables that are not
guaranteed to be in an article, then you should always check to make
sure the variables don't have a nil value before you use one.  If
you assign into some index of a nil array, you could get into real
trouble.
.P
Usually you do this with a short circuit \fB&&\fP operator, as in:
.Bb
if( keywords != nilarray && "rot13" in keywords )
	reject;
.Be
.P
With integer and date variables, you will only get a zero value, so it
may not be absolutely necessary to check, but it is still always a good
idea.


.B "Important Note:"
.P
Remember this:  A nil array is not the same as an empty array.  A nil string
is not an empty string (\fB""\fP).  If you use variables that might
be nil, beware.
.P
In general, it's a good idea to use variables that can't be nil,
such as \fBrdistribution\fP.   You can also make your own functions to
do certain tasks for you.  For example:
.Bb
string
safestring( string s )
{
	return s == nilstring ? "" : s;
}
.Be
could be applied so that nil strings always become empty strings.  You
could also define this as a CPP macro, or just use the \fB?\fP query
operator wherever necessary.
.P
The nil values are important, as they let you test if a header field was
present in the article at all.  In some cases, such as the \fBApproved:\fP
header, the important thing is that the header is present.  Currently, at
least, it doesn't matter what's in it.
.H 2 "Cross Posting"
.P
If you are writing a program to be used in \fIbatch\fP mode, be sure to
include the declaration:
.Bb
extern string array xref;
.Be
somewhere in your program, or use some other system to avoid duplicate
articles.
.P
You may want to do this \fBextern\fP even in \fInewsrc\fP mode, to
simplify processing.   It does nothing in the processing modes that don't
work with a \fB.newsrc\fP file, like the \fIpipe\fP and \fIfilter\fP modes.
.P
Another way to eliminate crossposts is to reject all articles where the
first newsgroup in the \fBnewsgroups\fP array is not the current newsgroup,
so long as that first newsgroup is a \fBnewsrc\_group\fP.  (If it isn't
you will want to key on the first newsgroup in the array that is found
in the \fB.newsrc\fP.)

.H 2 "Speed"
.P
Don't import externals that you don't need.  Sometimes just importing
an external variable requests pre-processing that takes time.
This applies to
all the header variables, along with \fBdistribution\_level\fP and some
of the statistical variables.
.P
Be conservative with your use of references to segments of the article
body.  This can involve lots of disk I/O if you have lots of articles
to scan.  We advise that you keep body scans to your newsgroup specific
code.  If you have a body scan for every article, you can expect the
program to take a lot more time.   Of course, NewsClip is quite fast,
so this may be acceptable, particularly if it saves \fIyou\fP time.
.P
Try to use the variables like \fBlines\fP and \fBarticle\_bytes\fP
that don't usually require the reading of the whole article.   Note
that \fBarticle\_bytes\fP sometimes does have to read the whole article
when you are running in pipe mode on a system that doesn't have the
news article files.
.P
In general, your code is getting compiled to C, and thus directly to
machine code.  Don't be afraid of loops and integer operations in your code.
They should go quite quickly.
.P
Optimize where you can with the use of the \fI+only\fP option or the
\fBreject\_all\fP and \fBaccept\_all\fP variables.   Try the \fBnamed\_group\fP
trick described in the chapter on general technique.
.P
Stick to simple patterns where possible -- they search faster.  Also,
use constant patterns where you can.  When your NewsClip program is
run, your constant patterns (quoted strings to the right of a \fBhas\fP
operator) get converted into the internal regular expression language only once,
instead of each time a search is done.
.P
In particular, the or-bar (\fB|\fP) regular expression feature is not very
efficient.  It can often be significantly faster to code:
.Bb
body has "foo" || body has "bar" || body has "abc.*def"
.Be
than
.Bb
body has "foo|bar|abc.*def"
.Be
particularly if you put the most likely patterns first.

.H 2 "Patterns"
.P
Do be sure to watch out for the regular expression ``metacharacters.''
These are ``\fB^$.[]()+?|\\*\fP''.  If you're an \fBed\fP or
\fBgrep\fP user, this will be second nature to you, although you
should still watch out for the extra \fBegrep\fP characters, particularly
the parentheses, plus, question mark and or-bar.
.P
If you wish to store a literal string in an array or database for later
use in searching, you may wish to apply the string function
\fBliteral\_pattern\fP to it.   This is always wise if you're taking
something like a subject line, which could contain all sorts of
characters.

.H 2 "Databases"
.P
If you regularly search for a string array in a database, such as the
popular search for \fBreferences\fP in a database of bad message-ids, then
only the first entry found will get its ``access time'' updated.  If the
whole \fBreferences\fP array is found in the database, only the first
will get marked as accessed.
.P
This means that the later IDs will eventually fade away from the database.
This should not present a problem, since they will all be children of
the parent ID in normal circumstances.
.P
If this could cause a problem, you will have to write your own \fBin\fP
function, which performs a loop, and doesn't stop after an entry is found.
This will update all entries, but it might take a bit longer.

.H 2 "Working With Newsreaders"
.P
Some newsreaders, like RN, have a powerful macro language.  You will find
that it is possible in RN to define macros that will do automatic updates
of your databases of bad messages, bad users, good or bad subjects or
whatever you please.   If you build your NewsClip program from a
series of \fB#include\fPd group files, you can even set up macros to
do automatic edits of those files when desired, and then recompile the
whole thing with a \fBMake\fP file.  See the RN manual for details.
.P
You can also issue commands directly to your NewsClip program
directly from a modified reader like RN.  See our special appendix on
that topic.

.H 2 "Kill Files"
.P
Exactly duplicating the kill file interface of RN is not simple, although
it can be done.  The interface in NewsClip is of course, much more
flexible.  RN's kill files can issue commands on articles that match
headers in the subject line, entire header and body.   It's
easy to do pattern searches in the subject or article body with NewsClip.
You can't search the entire header, but the RN header search was only
provided to simplify the KILL file interface.
.P
If you want something that's like a kill file, just read a local KILL
database for your newsgroup and say:
.Bb
reject if subject has killdb;
.Be
or
.Bb
reject if body has killdb;
.Be
If you want to keep it all in one database, you could read in the
database, and then do a loop splitting the database into a bunch of
different arrays or databases of patterns, using the integer key values.

.H 2 "Variant Parsing"
.P
You may not wish to have your header lines handled the same way in
every newsgroup.  For example, in one newsgroup you might wish the
\fBkeywords\fP line to be delimited with spaces, and in another you
might wish commas.  (Normally it uses commas.)
.P
You can't do that with the normal header variable declaration system,
as the parsing of the header variables is done before you get to process
the article yourself.
.P
The solution is to define your header variables as simple strings, as in:
.Bb
header string keywords : "keywords";
.Be
and then parse the string yourself.  For example:
.Bb
string array keys;
switch( main_newsgroup ) {
	case #rec.humor.funny:
		parse keys = keywords, "S,";
		accept if laugh in keys;
		break;
	default:
		parse keys = "keywords, " ";
		if( keys has "^foo" )
			adjust 20;
		break;
	}
.Be

.H 2 "Feeding Sites"
.P
If you use NewsClip's \fIbatch\fP mode to feed other sites (or users)
from a \fB.newsrc\fP file, you must be sure to include the group
``control'' in the list of subscribed groups.  This will pass control
messages (cancellations of articles etc.) to your feed site.
.P
While it should usually do little harm to pass all control messages, you
may wish to filter them further.  The ``control'' group is unusual, in
that the groups on the \fBNewsgroups:\fP line will not include \fBcontrol\fP,
but will rather be the groups to which the control message applies.
.P
You may wish to forward control messages only if they include a group you
already subscribe to.  The \fBnewsrc\_group\fP function tells you if a group
was one of those listed in the \fB.newsrc\fP file.  You may also wish
to include hierarchies of control messages to catch new group creation
messages.  You may wish to filter out boring ``ihave/sendme'' protocol
control messages by looking at the control line.
.P
Newsgroup creation messages get posted to the special pseduo-group,
``\fIgroupname\fP.ctl.''  Thus the creation message for ``comp.misc''
was ``posted'' to ``comp.misc.ctl'' -- watch for that.  Special control
messages may also be posted to fake groups that end in ``.ctl.''  This
means you may wish to use pattern matching on your newsgroup names instead
of the usual exact match schemes.
.P
If you catch a creation message that you want to propagate, you may also
wish to add the created group to your \fB.newsrc\fP file.  Use the
\fBsubscribe\fP procedure to do this.
.P
Feeding with a \fB.newsrc\fP has some powerful advantages.  For example,
it's easy to have a complex subscription list.  You can even combine together
all the \fB.newsrc\fP files from the remote site, add ``control'' and build
a file that only sends what is actually read.

.H 2 "Examples"
.P
Here are some examples of how to code for common actions.  Some of these
examples are conditional expressions, which you can then use in \fBif\fP,
\fBreject if\fP or \fBaccept if\fP statements, as desired.   In most
cases, these examples are code fragments, and not complete programs.  It
is assumed that they exist within larger programs.  (For example it's
pointless to have a program that just does \fBaccept if\fP, as \fBaccept\fP
is the default action.

.H 3 "My Own Articles"
.P
To see your own articles and all followups to them:
.Bb
database myarticles;
extern string message\_id;
extern userid from;
extern string array references;
procedure init()
{
	myarticles = read\_database( "~./News/myarts" );
}
procedure article()
{
	extern string my\_mail\_address;
	if( from == my\_mail\_address ) {
		myarticles[message\_id] = true;
		accept;
		}
	if( references != nilarray && references in myarticles )
		accept;
	/* more code */
}
procedure terminate()
{
	extern datetime time\_now;
	write\_database( myarticles, "~./News/myarts", time\_now - month );
}
.Be

.H 3 "Local Articles"
.P
Show me articles by people from my site:
.Bb
extern userid from;
{procgap}
extern string my\_domain;
extern string domain( string );
accept if domain( from ) == my\_domain;
.Be
.H 3 "Locally Distributed Articles"
.P
Show me articles posted for citywide distribution or smaller:
.Bb
extern int distribution\_level;
extern int dlevel( newsgroup );
accept if distribution\_level <= dlevel(#city);
.Be
.P
You may want to filter by distribution based on the group.  In some groups
you might want to read the whole netwide stream, and in others you might
want to read only the local stream.  In some groups, you might even want to
eliminate the local stream.
.H 3 "Crossposting"
.P
An article might be considered too heavily crossposted if
\fBcount(newsgroups) > 4\fP.  On the other hand, you might decide in
some groups to only read articles unique to the group with:
.Bb
case #news.admin:
	reject if count(newsgroups) > 1;
	break;
.Be
.P
You might want to be a bit more lenient than that.  The following code:
.Bb
extern newsgroup main_newsgroup;
reject if main_newsgroup != newsgroups[0];
.Be
rejects articles where the primary newsgroup isn't the one you
are currently processing.  This means messages that were posted to your
group as a possible afterthought.  You might wish to give them a lower
score or reject them out of hand.   Of course, if you do subscribe to
the primary newsgroup (first on the \fBnewsgroups\fP list), then you
will still see the article in that group.  If you don't subscribe, you
won't see it at all.
.H 3 "Eliminating a User"
.P
You can eliminate a list of users from ``your'' net, so that you don't
see their articles, and you don't even see followups to their articles.
.Bb
database badusers;
databaes badarticles;
extern string message\_id;
extern userid from;
extern string array references;
procedure init()
{
	badusers = read\_database( "~./News/badusers" );
	badarticles = read\_database( "~./News/badarts" );
}
procedure article()
{
	/* does it come from a nasty user?  Mark it */
	if( from in badusers ) {
		badarticles[message\_id] = true;
		reject;
		}
	reject if references != nilarray && references in badarticles;
	/* more code */
}
procedure terminate()
{
	extern datetime time\_now;
	write\_database( badarticles, "~./News/badarts", time\_now - month );
}
.Be
.H 4 "\fIReally\fP Eliminating a User"
.P
There are still many sites out there that don't build proper
\fBreferences\fP chains on their articles.   To really eliminate followups
to an article, you have to do more than add the message id to a database of
bad messages.   If the article is an original, with no ``Re:'' at the
front of the subject, you should also add the subject line to a
database of bad subjects.
.P
And if you want to get really fancy, you could have your program search
article bodies for mentions of the user's name.
.H 4 "If you Eliminate a User"
.P
If you decide that you would be better of eliminating the postings of
a USENET user, it would be a good idea to send a brief mail note to this
user indicating that you have done so, possibly including the reason
why.
.P
Some users who make annoying mistakes on USENET may not realize that
they are making mistakes, or they may not realize the extent to which
they are annoying people.  If they are informed that some readers have
decided to read no more of their writing, they may decide to change
their behavior.  That is up to the poster, of course.
.H 3 "Included Text & Signatures"
.P
You may not like long rebuttal articles with lots of included text.
In some groups, you could then include:
.Bb
extern int lines;
{procgap}
reject if lines > 50 && lines / line\_count( included ) < 2;
.Be
which rejects long articles that are more than half included text.
.P
You could also reject (or lower the score) of articles that are short
and have big signatures.
.Bb
extern int lines;
{procgap}
reject if lines < 30 && line\_count( signature ) > 9
.Be
To get fancy, you could have an \fBif\fP statement add the posters of
such articles to your \fBbadusers\fP database (see above) so that you
never hear from them again!   In this case you would have to write out
your \fBbadusers\fP database at the end of the session.
.H 3 "Followups"
.P
In some groups, it's better to just ignore the followups.   Try
.Bb
extern int followup;
/* big group switch */
case #rec.humor:
	reject if followup;
	break;
.Be
You might not be so harsh, but instead just lower the score or apply
further tests before allowing followups to make it through.
.P
Another idea is to ignore followups except in the main group on the
newsgroup list.  Try this:
.Bb
extern int followup;
extern newsgroup main\_newsgroup;
reject if followup && main\_newsgroup != newsgroups[0];
.Be
.H 3 "Two Out of Three Ain't Bad"
.P
You can use integer arithmetic in combination with the fact that
conditional expressions return 1 for true and 0 for false.  To accept
an article that has 2 out of 3 keywords in the subject:
.Bb
extern string subject;
{procgap}
accept if (subject has "baz") + (subject has "bar") + (subject has "foo") > 1;
.Be
.H 3 "Patterns of Groups"
.P
You can get pretty fancy with what you do with crossposted articles.  In
fact, with the right use of NewsClip, crossposting could be a good
thing.  Say you want to only see space articles that also pertain to
astronomy.   You could either use \fBis sci.space && is sci.astro\fP in
a general expression, or if you use a \fBswitch\fP, you could say:
.Bb
case #sci.astro:
	reject if !is sci.space;
.Be
Likewise you could say:
.Bb
case #rec.humor:
	reject if is talk.bizzare;
.Be
to eliminate only the messages crossposted to that other group.  No
doubt \fBreject if is comp.sys.atari.st && is comp.sys.amiga\fP will
be popular!  Likewise, if people are kind enough to crosspost to
``alt.flame'', that lets you control whether you read the article or not.
.P
Use boolean logic on groups to your heart's content.



.H 1 "Debug & Testing"
.P
All programs of any complexity will have bugs, and yours will be
no exception.   Your bugs may simply cause articles to be accepted or
rejected improperly, or they may cause your filter program to crash,
either through an infinite loop or an exception.
.H 2 "Segmentation Fault"
.P
The most frustrating thing to see can be the message ``segmentation fault.''
(Sometimes ``memory fault.'')
This means, on Unix, that your program has tried to use memory
improperly.   This is often the result of an attempt to reference an
array, string or userid that has a \fBnil\fP value.
.P
You must remember that before you ever reference data in an array or
string that might not be defined, you must check that it is defined.
.P
There is a difference between \fBnilstring\fP and the empty string
(\fB""\fP).  For example, if you use the \fBsummary\fP header variable,
it will be \fBnilstring\fP if the header wasn't there, and \fB""\fP if
the header was there, but the summary was blank.
.P
The same is true for nil arrays.  \fBnilarray\fP isn't the same as an
array with no elements.   For your protection, the current release of
NewsClip has the \fBin\fP and \fBhas\fP operators treat \fBnillarray\fP
as an empty array, but this is not guaranteed to work in future releases.
.P
We do allow a nil database to be the same as an empty database when it
comes to looking in the database, but you can't use a nil database for
storing into -- you could get that ``segmentation fault.''
.P
Other causes of this error include: array indices that are out of bounds,
or a character index beyond the end of a string.
.P
Always beware of the most common cause, which is the use of a variable
that has not yet been assigned a value.

.H 2 "Debuggers"
.P
If you can't figure out the immediate cause of a problem like this, and
you are a C programmer, Unix has many debugging tools available to help
with this sort of problem.
.P
The C source produced by \fBncc\fP is fairly readable, and you should
be able to readily tell what line of the output C program corresponds
to a statement in your NewsClip program.   Use the \fI-l\fP option
of \fBncc\fP to generate a standalone C program.  You can then
compile and link it with the \fBnewsclip.a\fP library yourself, using
whatever debug options you desire.

.H 2 "Dprintf"
.P
NewsClip contains a special procedure called \fBdprintf\fP.  This acts
just like the \fBprintf\fP function from C, except it prints to the
standard error output.  It takes a variable number of arguments, from
1 to 5.  These can be strings, ints or dates.  See the man page for
\fBprintf\fP for full details.
.P
Insert debugging print statements in your programs so you can figure out
what's going on and what values are being assigned to variables.
.P
Please note that you can't print variables of type \fBnewsgroup\fP
or \fBuserid\fP.  Assign such values to strings first.  Alternatively, you 
can print newsgroups with the ``%d'' code, which will give the newsgroup 
number.

.H 2 "Warning Level"
.P
You can set the warning level for your NewsClip programs with the
\fIwarning=num\fP option.   Provide a number.  The higher the number,
the more warnings you get.   The default level is 1, and currently
warnings exist at levels 0 through 4.   Select a high number like 100
to get all warnings.
.P
You will be warned about conditions that are normally considered OK,
such as the reading of a non-existent database file, but you may also
learn some useful debugging information.

.H 2 "Trial Runs"
.P
To test and debug your programs, use the \fIfilter\fP or \fIlist\fP modes
of operation.  We suggest \fIfilter\fP for preliminary testing.
.P
To do this, prepare a list of article filenames, either with articles
made up by you or live articles on your system.  Use absolute pathnames
if possible.   Start perhaps with only one article in the list.  Run:
.Bb
nclip m=filter <list
.Be
on an \fBnclip\fP program that is full of \fBdprintf\fP statements.  You
should quickly be able to see what's happening as your program runs, and
find out how to fix it.
.P
If you start working on larger lists of files, include a statement like:
.Bb
extern string article\_filename;
dprintf( "%s\\n", article\_filename );
.Be
at the start of your \fBarticle\fP procedure, so that you know what article
your program is working on when it goes wrong.
.P
This variable is also a good one to look at if you're using a debugger.
.P
Later, you may be ready for the \fIlist\fP mode or even the \fInewsrc\fP mode,
perhaps in combination the \fInewsrc=\fP option.  We suggest the latter
option, as you should not run test programs on your real-live \fB.newsrc\fP
file.
.P
Even if your machine does not keep news around, and you only use
NewsClip programs in combination with newsreaders that know how to
talk to them, you can still make up your own sample articles (or have
your newsreader save out article files on your own machine) and run tests
on them as you please.

.H 2 "Debugging in Pipe Mode"
.P
It can be very difficult to debug a problem that only develops when your
program runs in communication with a newsreader.  The only indication you
may get of a problem is that the news filter process stops running, and commands
to it fail to work.  A good news reader will inform you that the filter has
died, but this may go by so quickly on the screen you can't spot it.
.P
To debug in filter mode, you must set the environment variable \fBNCLIPDEBUG\fP
to ``truepipe''.   Then run a newsreading session.  This will leave two
files in your ``dot'' directory (the directory where the \fB.newsrc\fP is)
named \fBinpipe\fP and \fBpipelog\fP.   The \fBpipelog\fP file is
an expanded log of the news filter's discussions with the newsreader.
Look at the end of it to see what your filter was doing when it died.
.P
The \fBinpipe\fP file is the most important for duplicating the problem.
Copy it somewhere safe, as future sessions might overwrite it.  Say you
put it in \fBbadbug\fP; start up your debugger on your \fBnclip\fP program
and run it with the arguments:
.Bb
mode=pipe <badbug
.Be
This should cause your reading session to be duplicated, so long as no
news that you processed has since expired.  You will see the news filter's
pipe responses to the newsreader printed on the standard output.  Most
importantly, your filter program will now fail inside the debugger, where
you can track down what is going on.
