/*
  Copyright Dave Bone 1998 - 2014
  All Rights Reserved. 
  No part of this document may be reproduced without written consent from the author.
	
FILE:		epsilon_rules.lex
Dates:		23 Sept. 2005
Purpose:	determine epsilon rules.
How:	Read the rules' tree in prefix order filtering out:
			::, rules-phrase, identifier, NULL
		adding rules to yes or no piles or maybe list
*/
/@
@i "/usr/local/yacco2/copyright.w"
@** |epsilon_rules| grammar.\fbreak
Determine and check for the following:\fbreak
\ptindent{1: epsilon rules}
\ptindent{2: rules not deriving any string}
\ptindent{3: referenced rules are defined}
\ptindent{4: recursion cycling deriving only epsilon}
\ptindent{5: ambiguous subrules caused by multiple recursion cycling}
\fbreak
Epsilon definition:\fbreak
A rule having a subrule with no grammar symbols present:
ie, it derives an empty symbol string.
This is explicit epsilon.
A second condition is when a rule's subrule 
has all its symbols as rules and they are
all epsiloned. I call this transient epsilon. It doesn't matter
how the condition is arrived at.
Now a third condition is the ``invisible-shift'' operator usuage.
Though it's not part of the token stream per say it is normally used
to get out of ambigous situations and so is part of the lookahead set. 
Once apon a time i thought this was an implicit epsilon: IT IS NOT.\fbreak
\fbreak
Some epsilon terminalogy:\fbreak
Noun: epsilon represents the empty string of symbols.
The Greek flavour ($\epsilon$) represents it visually.\fbreak
Adjective: epsilonable - has the empty symbol string condition
and non-epsilonable generates terminal strings.\fbreak
Verb: shake and bake the tenses.\fbreak 
Verbal: epsiloned.\fbreak
\fbreak
This grammar demonstrates the elegance of flow control.
Review of a grammar's flow control:\fbreak
This depends on how the grammar tree
is walked. This grammar receives its
tokens in prefix formation: 
a ``rule-def'' token is processed before
its ``subrule-def''.
Using sentences, a subrule's expression will be processed before its left-handside rule.
This means the dependencies are in bottom-up left-to-right order of 
token recognition.
One can use the prefix order to initialize the various conditions, and 
postfix the results from the bottom-up order.
A grammar of no tokens can be programmed using all epsiloned rules
whereby the rules act as logic triggers.
In this paricular example, the grammar was originally 
written to detect ``epsilon'' rules.
Now it detects whether all rules derive T. The grammar recognition
just loads up into a 2 dimensional list the rule x subrule x element.
Once the language is accepted, it then traverses the list to derive
the terminal strings per subrule.\fbreak
\fbreak
The Algorithm.\fbreak
Pass one: fill up the derives list\fbreak
Fill up the triplet list of all subrules: rule $\otimes$ subrule $\otimes$ element.
One can view it as a 2 dimensional array where the row is subrule dominent
with its subordinent element columns. The rule just tags along providing
the parental anchor. 
The grammar tree is walked top-down (prefix order) to 
load up the derives list. 
Only the ``rule-def'' and ``subrule-def'' come in as tokens
while the elements are filtered out.
This was done for efficiency reasons.
Within the grammar at the ``subrule-def'' logic point,
the ``subrule-def''  start element is fetched by
syntax directed code.
\fbreak
\fbreak
Pass 2: derive T for each subrule\fbreak
Try to derive terminal strings for each indeterminate subrule.
This stage does multilpe passes thru the derives list  
whereby each
pass is predicated on the fact that the previous pass
has a change in the status of a subrule 
--- advance to the next element or 
the subrule's elements have been consumed.
How is this a 2 dimensional walk?
The subrules in the list are the dominant axis while its elements are the
secondary axis. This is a dynamic grid of points being consumned. 
The final outcome is either the grammar's subrules all derive
terminal strings and the derives list is empty
or the remainder of items are
the pathological subrules.\fbreak
\fbreak
Examples of Pathological conditions:\fbreak
Rule {\bf A} calls itself or 
a variation of {\bf A} calls {\bf B} calls {\bf A} without
any terminal strings in their subrules.\fbreak
\fbreak
Please look at the following test grammars that exercise the pathos...\fbreak
\ptindent{1: |ts_path1.lex|}
\ptindent{2: |ts_path1a.lex|}
\ptindent{3: |ts_path2.lex|}
\ptindent{4: |ts_path3.lex|}
\ptindent{5: |ts_path4.lex|}
\ptindent{6: |ts_path5.lex|}
\ptindent{7: |ts_path6.lex|}
\ptindent{8: |ts_path7.lex|}
They are exercised by |qa_alltests.bat| batch command file.
@*2 Recursion and derives a string.\fbreak
Derives:\fbreak
Greek symbols are used to denote a mixed string of symbol(s) drawn from
the nonterminal, terminal vocabularies. The empty string is
represented by $\epsilon$. 
Derives is a relationship that starts with a string of symbols and substitutes 
 equivalent strings of symbols until eventually nomore substitutions can take place.
 The resulting string is either empty or a string of terminals.
 The order of substitution will be from left to right.
The $\longrightarrow$ symbol represents the derives relation. 
Sometimes i will subscript the greek symbol
with $\epsilon$ denoting that it derives the empty string.
The $\not\longrightarrow$\ symbol represents ``no derives'' takes place.
I use this where a specific subrule pattern is being 
discussed and the greek symbol is not present.
 

\fbreak
Left recursion: A \subrule \ A $\beta${} \fbreak
As shown, the rule {\bf A} calls itself from the left part of its subrule.
$\beta${} represents strings of Rules and Terminals\fbreak
\fbreak
Right recursion: A \subrule \  $\alpha${} \  A \fbreak
The rule {\bf A} calls itself from the end of the subrule as shown.
$\alpha${} represents strings of Rules and Terminals.\fbreak

@*2 Pathological Grammars.\fbreak
The below pictures grids the good the bad and the ugly.
\fbreak
Note: the emitted code contains some imperfections caused by |cweave|.
When i have time i'll deal with |cweave|. Reader beware.
\fbreak
\fbreak
@/

fsm	
(fsm-id "epsilon_rules.lex"
,fsm-filename epsilon_rules
,fsm-namespace NS_epsilon_rules
,fsm-class Cepsilon_rules{
  user-prefix-declaration
#include "yacco2_stbl.h"
using namespace NS_yacco2_terminals;
#define EPSILON_YES 0
#define EPSILON_NO 1
#define EPSILON_DONT_KNOW 2
#define EPSILON_MAYBE 3

struct elem_list_type{
  rule_def* rule_;
  T_subrule_def* subrule_;
  AST* elem_t_;
  int gen_epsilon_;
  int gen_t_;
  elem_list_type
	(rule_def* Rule
    ,T_subrule_def* Subrule
    ,AST* Elem_t){
    rule_ = Rule;
    subrule_ = Subrule;
    elem_t_ = Elem_t;
    gen_epsilon_ = EPSILON_DONT_KNOW;
    gen_t_ = false;
  };
  elem_list_type(){
    rule_ = 0;
    subrule_ = 0;
    elem_t_ = 0;
    gen_epsilon_ = false;
    gen_t_ = false;
  };
};
  ***
  user-declaration
    public:
		std::list<elem_list_type > derives_list_;
		rule_def* rule_def_;
		T_subrule_def* subrule_def_;
		AST* elem_t_;
		tok_can< AST* > * ip_can_;
       void deal_with_derives_list();
	   void deal_with_undecided_derives_list();
	   AST* advance_element(AST* Elemt);
  ***
  user-implementation       
  AST* Cepsilon_rules::advance_element(AST* Elemt){
	AST* nxt_t = AST::brother(*Elemt);// next element
    for(;;){// bypass thread expression: go to jail ... eosubrule
  using namespace NS_yacco2_T_enum;
	 CAbs_lr1_sym* sym = AST::content(*nxt_t);
	 switch(sym->enumerated_id__){
	   case T_Enum::T_T_identifier_: break;  
       case T_Enum::T_T_2colon_:  break; 
       case T_Enum::T_T_NULL_: break;
       case T_Enum::T_T_cweb_marker_: break; 
       case T_Enum::T_T_cweb_comment_: break; 
       default: return nxt_t;
      } 
	  nxt_t = AST::brother(*nxt_t);// skip thread expr: next element
     }  
  }
  void Cepsilon_rules::deal_with_derives_list(){
  // force cweave to 
  using namespace NS_yacco2_T_enum;
         bool has_list_chged(false);
         std::list<elem_list_type >::iterator i; 
         Read_list:;
            i = derives_list_.begin();
	if(i == derives_list_.end()) return;
	for(;i != derives_list_.end();){ // force cweave
	  elem_list_type& el = *i;
	  CAbs_lr1_sym* rteos = AST::content(*el.elem_t_);// force cweave
	  switch(rteos->enumerated_id__){// force cweave
		case T_Enum::T_refered_rule_:{
		  refered_rule* rr = (refered_rule*)rteos;
		  rule_in_stbl* rstbl = rr->Rule_in_stbl();
		      using namespace yacco2_stbl;
			T_sym_tbl_report_card report_card;
			get_sym_entry_by_sub(report_card,rstbl->stbl_idx());
			if(report_card.status_ == T_sym_tbl_report_card::failure){
			 report_card.err_entry_->set_rc(*rr,__FILE__,__LINE__);						
			 parser__->add_token_to_error_queue(*report_card.err_entry_);
			 return;// caused by problem with sym. tbl
			}
			if(report_card.tbl_entry_->defined_ == false){
			 CAbs_lr1_sym* sym = new Err_used_rule_but_undefined;
			 sym->set_rc(*rr,__FILE__,__LINE__);
			 parser__->add_token_to_error_queue(*sym);
                         // keep pouring them into the error queue
			 has_list_chged = true;// go to next element
			 el.elem_t_ = advance_element(el.elem_t_);
			 ++i;// next sr in derive list
			 break;
		        }			
		        rule_def* r = rr->Rule_in_stbl()->r_def();
		        if(r->epsilon() == YES){
			 has_list_chged = true;
			 el.elem_t_ = advance_element(el.elem_t_);
			 ++i;// next sr in derive list
			 if(el.gen_epsilon_ != EPSILON_NO){
			  el.gen_epsilon_ = EPSILON_MAYBE;
			 }
			 break;
		        }
		  if(r->derive_t() == YES){
			has_list_chged = true;
			el.elem_t_ = advance_element(el.elem_t_);
		    el.gen_epsilon_ = EPSILON_NO;
			++i;// next sr in derive list
			break;
		  }
		  ++i;// indeterminate spot: next pass might free it up so next sr
		  break; 
		}
		case T_Enum::T_T_eosubrule_:{
		  if(el.gen_epsilon_!= EPSILON_NO){
                  el.rule_->epsilon(YES);
                  el.subrule_->epsilon(YES);
                }else{
                  el.subrule_->epsilon(NO);
                  el.rule_->derive_t(YES);
                }
		  has_list_chged = true;
		  i = derives_list_.erase(i);// finished with sr
		  break;
		}
		case T_Enum::T_T_null_call_thread_eosubrule_:{
        	  el.gen_epsilon_ = EPSILON_NO;
                el.subrule_->epsilon(NO);
                  el.rule_->derive_t(YES);
		  has_list_chged = true;
		  i = derives_list_.erase(i);// finished with sr
		  break;
		}
		case T_Enum::T_T_called_thread_eosubrule_:{
		  el.gen_epsilon_ = EPSILON_NO;
                  el.subrule_->epsilon(NO);
                  el.rule_->derive_t(YES);
		  has_list_chged = true;
		  i = derives_list_.erase(i);// finished with sr
		  break;
		}
		case T_Enum::T_refered_T_:{
		  el.gen_epsilon_ = EPSILON_NO;
                  el.subrule_->epsilon(NO);
		  has_list_chged = true;
		  el.elem_t_ = advance_element(el.elem_t_);
		  ++i;// next sr in derive list
		  refered_T* rt = (refered_T*)rteos;
		  break;
		}
		default:{ // lr k 
		  el.gen_epsilon_ = EPSILON_NO;
                  el.subrule_->epsilon(NO);
		  has_list_chged = true;
		  el.elem_t_ = advance_element(el.elem_t_);
		  ++i;// next sr in derive list
		  break;
		}
	  }
	       
	}
	if(derives_list_.empty() == true) return;// kosher grammar
	if(has_list_chged == true){// still o/s but going forward
	  has_list_chged = false;
          goto Read_list;		 
	}
}
	   void Cepsilon_rules::deal_with_undecided_derives_list(){
	     // chk if undefined rules were found: yes out damn spot
	     if(parser__->error_queue()->empty() != true) return;
	   	 // errors: pathological grammar
         std::list<elem_list_type >::iterator 
         i = derives_list_.begin();
		if(i == derives_list_.end()) return;
		for(;i != derives_list_.end();++i){
			  elem_list_type& el = *i;
			  CAbs_lr1_sym* elem = AST::content(*el.elem_t_);
			  CAbs_lr1_sym* sym = new ERR_sick_grammar;
			  sym->set_rc(*elem,__FILE__,__LINE__);
			  parser__->add_token_to_error_queue(*sym);
		 }
	   }
	   
  ***
  op
    rule_def_ = 0;
	subrule_def_ = 0;
	elem_t_ = 0;
    ip_can_ = (tok_can< AST* >*)parser__->token_supplier__;
  ***
}
,fsm-version "1.0",fsm-date "26 sept. 2005",fsm-debug "false"
,fsm-comments "Determine whether rules are epsilon, derive T, or are pathological.")
@"/usr/local/yacco2/compiler/grammars/yacco2_T_includes.T"

rules{
Repsilon_rules  
(){
  -> Rrules eog {
/@
Pass 2 and greater.
@/
  op
    Cepsilon_rules* fsm = (Cepsilon_rules*)rule_info__.parser__->fsm_tbl__;
	fsm->deal_with_derives_list();
	if(fsm->derives_list_.empty() != true){
	  fsm->deal_with_undecided_derives_list();
	}
  ***
  }
}

Rrules  (){
  -> Rrule
  -> Rrules Rrule
}

Rrule  (){
  -> Rrule_def Rsubrules 
}

Rrule_def  (){
  -> "rule-def" {
  /@
  Initialize for its subrule findings.
  @/ 
  op
    Cepsilon_rules* fsm = (Cepsilon_rules*)rule_info__.parser__->fsm_tbl__;
    fsm->rule_def_ = sf->p1__;
  ***
  }
}

Rsubrules  (){
  -> Rsubrule
  -> Rsubrules Rsubrule
}

Rsubrule  (){
  -> Rsubrule_def
}

Rsubrule_def  (){
  -> "subrule-def" {
  /@
  Create the entry within the |derives_list_|.
  This will be walked until either
  all the subrules' elements are derived or they are pathological.
  @/
  
  op
    Cepsilon_rules* fsm = (Cepsilon_rules*)rule_info__.parser__->fsm_tbl__;
    fsm->subrule_def_ = sf->p1__;
    AST* sr_t = fsm->subrule_def_->subrule_s_tree();
    AST* et = AST::get_spec_child(*sr_t,1);
   fsm->derives_list_.push_back
	(elem_list_type(fsm->rule_def_,fsm->subrule_def_,et));
  ***
  }
}

}// end of rules
