1 /* 2 * Copyright (c) 2003, 2017, Oracle and/or its affiliates. All rights reserved. 3 * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. 4 * 5 * This code is free software; you can redistribute it and/or modify it 6 * under the terms of the GNU General Public License version 2 only, as 7 * published by the Free Software Foundation. Oracle designates this 8 * particular file as subject to the "Classpath" exception as provided 9 * by Oracle in the LICENSE file that accompanied this code. 10 * 11 * This code is distributed in the hope that it will be useful, but WITHOUT 12 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 13 * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 14 * version 2 for more details (a copy is included in the LICENSE file that 15 * accompanied this code). 16 * 17 * You should have received a copy of the GNU General Public License version 18 * 2 along with this work; if not, write to the Free Software Foundation, 19 * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. 20 * 21 * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA 22 * or visit www.oracle.com if you need additional information or have any 23 * questions. 24 */ 25 26 27 package com.sun.java_cup.internal.runtime; 28 29 import java.util.Stack; 30 31 /** This class implements a skeleton table driven LR parser. In general, 32 * LR parsers are a form of bottom up shift-reduce parsers. Shift-reduce 33 * parsers act by shifting input onto a parse stack until the Symbols 34 * matching the right hand side of a production appear on the top of the 35 * stack. Once this occurs, a reduce is performed. This involves removing 36 * the Symbols corresponding to the right hand side of the production 37 * (the so called "handle") and replacing them with the non-terminal from 38 * the left hand side of the production. <p> 39 * 40 * To control the decision of whether to shift or reduce at any given point, 41 * the parser uses a state machine (the "viable prefix recognition machine" 42 * built by the parser generator). The current state of the machine is placed 43 * on top of the parse stack (stored as part of a Symbol object representing 44 * a terminal or non terminal). The parse action table is consulted 45 * (using the current state and the current lookahead Symbol as indexes) to 46 * determine whether to shift or to reduce. When the parser shifts, it 47 * changes to a new state by pushing a new Symbol (containing a new state) 48 * onto the stack. When the parser reduces, it pops the handle (right hand 49 * side of a production) off the stack. This leaves the parser in the state 50 * it was in before any of those Symbols were matched. Next the reduce-goto 51 * table is consulted (using the new state and current lookahead Symbol as 52 * indexes) to determine a new state to go to. The parser then shifts to 53 * this goto state by pushing the left hand side Symbol of the production 54 * (also containing the new state) onto the stack.<p> 55 * 56 * This class actually provides four LR parsers. The methods parse() and 57 * debug_parse() provide two versions of the main parser (the only difference 58 * being that debug_parse() emits debugging trace messages as it parses). 59 * In addition to these main parsers, the error recovery mechanism uses two 60 * more. One of these is used to simulate "parsing ahead" in the input 61 * without carrying out actions (to verify that a potential error recovery 62 * has worked), and the other is used to parse through buffered "parse ahead" 63 * input in order to execute all actions and re-synchronize the actual parser 64 * configuration.<p> 65 * 66 * This is an abstract class which is normally filled out by a subclass 67 * generated by the JavaCup parser generator. In addition to supplying 68 * the actual parse tables, generated code also supplies methods which 69 * invoke various pieces of user supplied code, provide access to certain 70 * special Symbols (e.g., EOF and error), etc. Specifically, the following 71 * abstract methods are normally supplied by generated code: 72 * <dl compact> 73 * <dt> short[][] production_table() 74 * <dd> Provides a reference to the production table (indicating the index of 75 * the left hand side non terminal and the length of the right hand side 76 * for each production in the grammar). 77 * <dt> short[][] action_table() 78 * <dd> Provides a reference to the parse action table. 79 * <dt> short[][] reduce_table() 80 * <dd> Provides a reference to the reduce-goto table. 81 * <dt> int start_state() 82 * <dd> Indicates the index of the start state. 83 * <dt> int start_production() 84 * <dd> Indicates the index of the starting production. 85 * <dt> int EOF_sym() 86 * <dd> Indicates the index of the EOF Symbol. 87 * <dt> int error_sym() 88 * <dd> Indicates the index of the error Symbol. 89 * <dt> Symbol do_action() 90 * <dd> Executes a piece of user supplied action code. This always comes at 91 * the point of a reduce in the parse, so this code also allocates and 92 * fills in the left hand side non terminal Symbol object that is to be 93 * pushed onto the stack for the reduce. 94 * <dt> void init_actions() 95 * <dd> Code to initialize a special object that encapsulates user supplied 96 * actions (this object is used by do_action() to actually carry out the 97 * actions). 98 * </dl> 99 * 100 * In addition to these routines that <i>must</i> be supplied by the 101 * generated subclass there are also a series of routines that <i>may</i> 102 * be supplied. These include: 103 * <dl> 104 * <dt> Symbol scan() 105 * <dd> Used to get the next input Symbol from the scanner. 106 * <dt> Scanner getScanner() 107 * <dd> Used to provide a scanner for the default implementation of 108 * scan(). 109 * <dt> int error_sync_size() 110 * <dd> This determines how many Symbols past the point of an error 111 * must be parsed without error in order to consider a recovery to 112 * be valid. This defaults to 3. Values less than 2 are not 113 * recommended. 114 * <dt> void report_error(String message, Object info) 115 * <dd> This method is called to report an error. The default implementation 116 * simply prints a message to System.err and where the error occurred. 117 * This method is often replaced in order to provide a more sophisticated 118 * error reporting mechanism. 119 * <dt> void report_fatal_error(String message, Object info) 120 * <dd> This method is called when a fatal error that cannot be recovered from 121 * is encountered. In the default implementation, it calls 122 * report_error() to emit a message, then throws an exception. 123 * <dt> void syntax_error(Symbol cur_token) 124 * <dd> This method is called as soon as syntax error is detected (but 125 * before recovery is attempted). In the default implementation it 126 * invokes: report_error("Syntax error", null); 127 * <dt> void unrecovered_syntax_error(Symbol cur_token) 128 * <dd> This method is called if syntax error recovery fails. In the default 129 * implementation it invokes:<br> 130 * report_fatal_error("Couldn't repair and continue parse", null); 131 * </dl> 132 * 133 * @see com.sun.java_cup.internal.runtime.Symbol 134 * @see com.sun.java_cup.internal.runtime.Symbol 135 * @see com.sun.java_cup.internal.runtime.virtual_parse_stack 136 * @author Frank Flannery 137 */ 138 139 public abstract class lr_parser { 140 141 /*-----------------------------------------------------------*/ 142 /*--- Constructor(s) ----------------------------------------*/ 143 /*-----------------------------------------------------------*/ 144 145 /** Simple constructor. */ 146 public lr_parser() 147 { 148 /* nothing to do here */ 149 } 150 151 /** Constructor that sets the default scanner. [CSA/davidm] */ 152 public lr_parser(Scanner s) { 153 this(); /* in case default constructor someday does something */ 154 setScanner(s); 155 } 156 157 /*-----------------------------------------------------------*/ 158 /*--- (Access to) Static (Class) Variables ------------------*/ 159 /*-----------------------------------------------------------*/ 160 161 /** The default number of Symbols after an error we much match to consider 162 * it recovered from. 163 */ 164 protected final static int _error_sync_size = 3; 165 166 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 167 168 /** The number of Symbols after an error we much match to consider it 169 * recovered from. 170 */ 171 protected int error_sync_size() {return _error_sync_size; } 172 173 /*-----------------------------------------------------------*/ 174 /*--- (Access to) Instance Variables ------------------------*/ 175 /*-----------------------------------------------------------*/ 176 177 /** Table of production information (supplied by generated subclass). 178 * This table contains one entry per production and is indexed by 179 * the negative-encoded values (reduce actions) in the action_table. 180 * Each entry has two parts, the index of the non-terminal on the 181 * left hand side of the production, and the number of Symbols 182 * on the right hand side. 183 */ 184 public abstract short[][] production_table(); 185 186 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 187 188 /** The action table (supplied by generated subclass). This table is 189 * indexed by state and terminal number indicating what action is to 190 * be taken when the parser is in the given state (i.e., the given state 191 * is on top of the stack) and the given terminal is next on the input. 192 * States are indexed using the first dimension, however, the entries for 193 * a given state are compacted and stored in adjacent index, value pairs 194 * which are searched for rather than accessed directly (see get_action()). 195 * The actions stored in the table will be either shifts, reduces, or 196 * errors. Shifts are encoded as positive values (one greater than the 197 * state shifted to). Reduces are encoded as negative values (one less 198 * than the production reduced by). Error entries are denoted by zero. 199 * 200 * @see com.sun.java_cup.internal.runtime.lr_parser#get_action 201 */ 202 public abstract short[][] action_table(); 203 204 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 205 206 /** The reduce-goto table (supplied by generated subclass). This 207 * table is indexed by state and non-terminal number and contains 208 * state numbers. States are indexed using the first dimension, however, 209 * the entries for a given state are compacted and stored in adjacent 210 * index, value pairs which are searched for rather than accessed 211 * directly (see get_reduce()). When a reduce occurs, the handle 212 * (corresponding to the RHS of the matched production) is popped off 213 * the stack. The new top of stack indicates a state. This table is 214 * then indexed by that state and the LHS of the reducing production to 215 * indicate where to "shift" to. 216 * 217 * @see com.sun.java_cup.internal.runtime.lr_parser#get_reduce 218 */ 219 public abstract short[][] reduce_table(); 220 221 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 222 223 /** The index of the start state (supplied by generated subclass). */ 224 public abstract int start_state(); 225 226 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 227 228 /** The index of the start production (supplied by generated subclass). */ 229 public abstract int start_production(); 230 231 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 232 233 /** The index of the end of file terminal Symbol (supplied by generated 234 * subclass). 235 */ 236 public abstract int EOF_sym(); 237 238 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 239 240 /** The index of the special error Symbol (supplied by generated subclass). */ 241 public abstract int error_sym(); 242 243 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 244 245 /** Internal flag to indicate when parser should quit. */ 246 protected boolean _done_parsing = false; 247 248 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 249 250 /** This method is called to indicate that the parser should quit. This is 251 * normally called by an accept action, but can be used to cancel parsing 252 * early in other circumstances if desired. 253 */ 254 public void done_parsing() 255 { 256 _done_parsing = true; 257 } 258 259 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 260 /* Global parse state shared by parse(), error recovery, and 261 * debugging routines */ 262 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 263 264 /** Indication of the index for top of stack (for use by actions). */ 265 protected int tos; 266 267 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 268 269 /** The current lookahead Symbol. */ 270 protected Symbol cur_token; 271 272 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 273 274 /** The parse stack itself. */ 275 protected Stack<Symbol> stack = new Stack<>(); 276 277 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 278 279 /** Direct reference to the production table. */ 280 protected short[][] production_tab; 281 282 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 283 284 /** Direct reference to the action table. */ 285 protected short[][] action_tab; 286 287 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 288 289 /** Direct reference to the reduce-goto table. */ 290 protected short[][] reduce_tab; 291 292 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 293 294 /** This is the scanner object used by the default implementation 295 * of scan() to get Symbols. To avoid name conflicts with existing 296 * code, this field is private. [CSA/davidm] */ 297 private Scanner _scanner; 298 299 /** 300 * Simple accessor method to set the default scanner. 301 */ 302 public void setScanner(Scanner s) { _scanner = s; } 303 304 /** 305 * Simple accessor method to get the default scanner. 306 */ 307 public Scanner getScanner() { return _scanner; } 308 309 /*-----------------------------------------------------------*/ 310 /*--- General Methods ---------------------------------------*/ 311 /*-----------------------------------------------------------*/ 312 313 /** Perform a bit of user supplied action code (supplied by generated 314 * subclass). Actions are indexed by an internal action number assigned 315 * at parser generation time. 316 * 317 * @param act_num the internal index of the action to be performed. 318 * @param parser the parser object we are acting for. 319 * @param stack the parse stack of that object. 320 * @param top the index of the top element of the parse stack. 321 */ 322 public abstract Symbol do_action( 323 int act_num, 324 lr_parser parser, 325 Stack<Symbol> stack, 326 int top) 327 throws java.lang.Exception; 328 329 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 330 331 /** User code for initialization inside the parser. Typically this 332 * initializes the scanner. This is called before the parser requests 333 * the first Symbol. Here this is just a placeholder for subclasses that 334 * might need this and we perform no action. This method is normally 335 * overridden by the generated code using this contents of the "init with" 336 * clause as its body. 337 */ 338 public void user_init() throws java.lang.Exception { } 339 340 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 341 342 /** Initialize the action object. This is called before the parser does 343 * any parse actions. This is filled in by generated code to create 344 * an object that encapsulates all action code. 345 */ 346 protected abstract void init_actions() throws java.lang.Exception; 347 348 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 349 350 /** Get the next Symbol from the input (supplied by generated subclass). 351 * Once end of file has been reached, all subsequent calls to scan 352 * should return an EOF Symbol (which is Symbol number 0). By default 353 * this method returns getScanner().next_token(); this implementation 354 * can be overriden by the generated parser using the code declared in 355 * the "scan with" clause. Do not recycle objects; every call to 356 * scan() should return a fresh object. 357 */ 358 public Symbol scan() throws java.lang.Exception { 359 return getScanner().next_token(); 360 } 361 362 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 363 364 /** Report a fatal error. This method takes a message string and an 365 * additional object (to be used by specializations implemented in 366 * subclasses). Here in the base class a very simple implementation 367 * is provided which reports the error then throws an exception. 368 * 369 * @param message an error message. 370 * @param info an extra object reserved for use by specialized subclasses. 371 */ 372 public void report_fatal_error( 373 String message, 374 Object info) 375 throws java.lang.Exception 376 { 377 /* stop parsing (not really necessary since we throw an exception, but) */ 378 done_parsing(); 379 380 /* use the normal error message reporting to put out the message */ 381 report_error(message, info); 382 383 /* throw an exception */ 384 throw new Exception("Can't recover from previous error(s)"); 385 } 386 387 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 388 389 /** Report a non fatal error (or warning). This method takes a message 390 * string and an additional object (to be used by specializations 391 * implemented in subclasses). Here in the base class a very simple 392 * implementation is provided which simply prints the message to 393 * System.err. 394 * 395 * @param message an error message. 396 * @param info an extra object reserved for use by specialized subclasses. 397 */ 398 public void report_error(String message, Object info) 399 { 400 System.err.print(message); 401 if (info instanceof Symbol) 402 if (((Symbol)info).left != -1) 403 System.err.println(" at character " + ((Symbol)info).left + 404 " of input"); 405 else System.err.println(""); 406 else System.err.println(""); 407 } 408 409 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 410 411 /** This method is called when a syntax error has been detected and recovery 412 * is about to be invoked. Here in the base class we just emit a 413 * "Syntax error" error message. 414 * 415 * @param cur_token the current lookahead Symbol. 416 */ 417 public void syntax_error(Symbol cur_token) 418 { 419 report_error("Syntax error", cur_token); 420 } 421 422 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 423 424 /** This method is called if it is determined that syntax error recovery 425 * has been unsuccessful. Here in the base class we report a fatal error. 426 * 427 * @param cur_token the current lookahead Symbol. 428 */ 429 public void unrecovered_syntax_error(Symbol cur_token) 430 throws java.lang.Exception 431 { 432 report_fatal_error("Couldn't repair and continue parse", cur_token); 433 } 434 435 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 436 437 /** Fetch an action from the action table. The table is broken up into 438 * rows, one per state (rows are indexed directly by state number). 439 * Within each row, a list of index, value pairs are given (as sequential 440 * entries in the table), and the list is terminated by a default entry 441 * (denoted with a Symbol index of -1). To find the proper entry in a row 442 * we do a linear or binary search (depending on the size of the row). 443 * 444 * @param state the state index of the action being accessed. 445 * @param sym the Symbol index of the action being accessed. 446 */ 447 protected final short get_action(int state, int sym) 448 { 449 short tag; 450 int first, last, probe; 451 short[] row = action_tab[state]; 452 453 /* linear search if we are < 10 entries */ 454 if (row.length < 20) 455 for (probe = 0; probe < row.length; probe++) 456 { 457 /* is this entry labeled with our Symbol or the default? */ 458 tag = row[probe++]; 459 if (tag == sym || tag == -1) 460 { 461 /* return the next entry */ 462 return row[probe]; 463 } 464 } 465 /* otherwise binary search */ 466 else 467 { 468 first = 0; 469 last = (row.length-1)/2 - 1; /* leave out trailing default entry */ 470 while (first <= last) 471 { 472 probe = (first+last)/2; 473 if (sym == row[probe*2]) 474 return row[probe*2+1]; 475 else if (sym > row[probe*2]) 476 first = probe+1; 477 else 478 last = probe-1; 479 } 480 481 /* not found, use the default at the end */ 482 return row[row.length-1]; 483 } 484 485 /* shouldn't happened, but if we run off the end we return the 486 default (error == 0) */ 487 return 0; 488 } 489 490 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 491 492 /** Fetch a state from the reduce-goto table. The table is broken up into 493 * rows, one per state (rows are indexed directly by state number). 494 * Within each row, a list of index, value pairs are given (as sequential 495 * entries in the table), and the list is terminated by a default entry 496 * (denoted with a Symbol index of -1). To find the proper entry in a row 497 * we do a linear search. 498 * 499 * @param state the state index of the entry being accessed. 500 * @param sym the Symbol index of the entry being accessed. 501 */ 502 protected final short get_reduce(int state, int sym) 503 { 504 short tag; 505 short[] row = reduce_tab[state]; 506 507 /* if we have a null row we go with the default */ 508 if (row == null) 509 return -1; 510 511 for (int probe = 0; probe < row.length; probe++) 512 { 513 /* is this entry labeled with our Symbol or the default? */ 514 tag = row[probe++]; 515 if (tag == sym || tag == -1) 516 { 517 /* return the next entry */ 518 return row[probe]; 519 } 520 } 521 /* if we run off the end we return the default (error == -1) */ 522 return -1; 523 } 524 525 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 526 527 /** This method provides the main parsing routine. It returns only when 528 * done_parsing() has been called (typically because the parser has 529 * accepted, or a fatal error has been reported). See the header 530 * documentation for the class regarding how shift/reduce parsers operate 531 * and how the various tables are used. 532 */ 533 public Symbol parse() throws java.lang.Exception 534 { 535 /* the current action code */ 536 int act; 537 538 /* the Symbol/stack element returned by a reduce */ 539 Symbol lhs_sym = null; 540 541 /* information about production being reduced with */ 542 short handle_size, lhs_sym_num; 543 544 /* set up direct reference to tables to drive the parser */ 545 546 production_tab = production_table(); 547 action_tab = action_table(); 548 reduce_tab = reduce_table(); 549 550 /* initialize the action encapsulation object */ 551 init_actions(); 552 553 /* do user initialization */ 554 user_init(); 555 556 /* get the first token */ 557 cur_token = scan(); 558 559 /* push dummy Symbol with start state to get us underway */ 560 stack.removeAllElements(); 561 stack.push(new Symbol(0, start_state())); 562 tos = 0; 563 564 /* continue until we are told to stop */ 565 for (_done_parsing = false; !_done_parsing; ) 566 { 567 /* Check current token for freshness. */ 568 if (cur_token.used_by_parser) 569 throw new Error("Symbol recycling detected (fix your scanner)."); 570 571 /* current state is always on the top of the stack */ 572 573 /* look up action out of the current state with the current input */ 574 act = get_action((stack.peek()).parse_state, cur_token.sym); 575 576 /* decode the action -- > 0 encodes shift */ 577 if (act > 0) 578 { 579 /* shift to the encoded state by pushing it on the stack */ 580 cur_token.parse_state = act-1; 581 cur_token.used_by_parser = true; 582 stack.push(cur_token); 583 tos++; 584 585 /* advance to the next Symbol */ 586 cur_token = scan(); 587 } 588 /* if its less than zero, then it encodes a reduce action */ 589 else if (act < 0) 590 { 591 /* perform the action for the reduce */ 592 lhs_sym = do_action((-act)-1, this, stack, tos); 593 594 /* look up information about the production */ 595 lhs_sym_num = production_tab[(-act)-1][0]; 596 handle_size = production_tab[(-act)-1][1]; 597 598 /* pop the handle off the stack */ 599 for (int i = 0; i < handle_size; i++) 600 { 601 stack.pop(); 602 tos--; 603 } 604 605 /* look up the state to go to from the one popped back to */ 606 act = get_reduce((stack.peek()).parse_state, lhs_sym_num); 607 608 /* shift to that state */ 609 lhs_sym.parse_state = act; 610 lhs_sym.used_by_parser = true; 611 stack.push(lhs_sym); 612 tos++; 613 } 614 /* finally if the entry is zero, we have an error */ 615 else if (act == 0) 616 { 617 /* call user syntax error reporting routine */ 618 syntax_error(cur_token); 619 620 /* try to error recover */ 621 if (!error_recovery(false)) 622 { 623 /* if that fails give up with a fatal syntax error */ 624 unrecovered_syntax_error(cur_token); 625 626 /* just in case that wasn't fatal enough, end parse */ 627 done_parsing(); 628 } else { 629 lhs_sym = stack.peek(); 630 } 631 } 632 } 633 return lhs_sym; 634 } 635 636 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 637 638 /** Write a debugging message to System.err for the debugging version 639 * of the parser. 640 * 641 * @param mess the text of the debugging message. 642 */ 643 public void debug_message(String mess) 644 { 645 System.err.println(mess); 646 } 647 648 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 649 650 /** Dump the parse stack for debugging purposes. */ 651 public void dump_stack() 652 { 653 if (stack == null) 654 { 655 debug_message("# Stack dump requested, but stack is null"); 656 return; 657 } 658 659 debug_message("============ Parse Stack Dump ============"); 660 661 /* dump the stack */ 662 for (int i=0; i<stack.size(); i++) 663 { 664 debug_message("Symbol: " + (stack.get(i)).sym + 665 " State: " + (stack.get(i)).parse_state); 666 } 667 debug_message("=========================================="); 668 } 669 670 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 671 672 /** Do debug output for a reduce. 673 * 674 * @param prod_num the production we are reducing with. 675 * @param nt_num the index of the LHS non terminal. 676 * @param rhs_size the size of the RHS. 677 */ 678 public void debug_reduce(int prod_num, int nt_num, int rhs_size) 679 { 680 debug_message("# Reduce with prod #" + prod_num + " [NT=" + nt_num + 681 ", " + "SZ=" + rhs_size + "]"); 682 } 683 684 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 685 686 /** Do debug output for shift. 687 * 688 * @param shift_tkn the Symbol being shifted onto the stack. 689 */ 690 public void debug_shift(Symbol shift_tkn) 691 { 692 debug_message("# Shift under term #" + shift_tkn.sym + 693 " to state #" + shift_tkn.parse_state); 694 } 695 696 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 697 698 /** Do debug output for stack state. [CSA] 699 */ 700 public void debug_stack() { 701 StringBuilder sb=new StringBuilder("## STACK:"); 702 for (int i=0; i<stack.size(); i++) { 703 Symbol s = stack.get(i); 704 sb.append(" <state "+s.parse_state+", sym "+s.sym+">"); 705 if ((i%3)==2 || (i==(stack.size()-1))) { 706 debug_message(sb.toString()); 707 sb = new StringBuilder(" "); 708 } 709 } 710 } 711 712 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 713 714 /** Perform a parse with debugging output. This does exactly the 715 * same things as parse(), except that it calls debug_shift() and 716 * debug_reduce() when shift and reduce moves are taken by the parser 717 * and produces various other debugging messages. 718 */ 719 public Symbol debug_parse() 720 throws java.lang.Exception 721 { 722 /* the current action code */ 723 int act; 724 725 /* the Symbol/stack element returned by a reduce */ 726 Symbol lhs_sym = null; 727 728 /* information about production being reduced with */ 729 short handle_size, lhs_sym_num; 730 731 /* set up direct reference to tables to drive the parser */ 732 production_tab = production_table(); 733 action_tab = action_table(); 734 reduce_tab = reduce_table(); 735 736 debug_message("# Initializing parser"); 737 738 /* initialize the action encapsulation object */ 739 init_actions(); 740 741 /* do user initialization */ 742 user_init(); 743 744 /* the current Symbol */ 745 cur_token = scan(); 746 747 debug_message("# Current Symbol is #" + cur_token.sym); 748 749 /* push dummy Symbol with start state to get us underway */ 750 stack.removeAllElements(); 751 stack.push(new Symbol(0, start_state())); 752 tos = 0; 753 754 /* continue until we are told to stop */ 755 for (_done_parsing = false; !_done_parsing; ) 756 { 757 /* Check current token for freshness. */ 758 if (cur_token.used_by_parser) 759 throw new Error("Symbol recycling detected (fix your scanner)."); 760 761 /* current state is always on the top of the stack */ 762 //debug_stack(); 763 764 /* look up action out of the current state with the current input */ 765 act = get_action((stack.peek()).parse_state, cur_token.sym); 766 767 /* decode the action -- > 0 encodes shift */ 768 if (act > 0) 769 { 770 /* shift to the encoded state by pushing it on the stack */ 771 cur_token.parse_state = act-1; 772 cur_token.used_by_parser = true; 773 debug_shift(cur_token); 774 stack.push(cur_token); 775 tos++; 776 777 /* advance to the next Symbol */ 778 cur_token = scan(); 779 debug_message("# Current token is " + cur_token); 780 } 781 /* if its less than zero, then it encodes a reduce action */ 782 else if (act < 0) 783 { 784 /* perform the action for the reduce */ 785 lhs_sym = do_action((-act)-1, this, stack, tos); 786 787 /* look up information about the production */ 788 lhs_sym_num = production_tab[(-act)-1][0]; 789 handle_size = production_tab[(-act)-1][1]; 790 791 debug_reduce((-act)-1, lhs_sym_num, handle_size); 792 793 /* pop the handle off the stack */ 794 for (int i = 0; i < handle_size; i++) 795 { 796 stack.pop(); 797 tos--; 798 } 799 800 /* look up the state to go to from the one popped back to */ 801 act = get_reduce((stack.peek()).parse_state, lhs_sym_num); 802 debug_message("# Reduce rule: top state " + 803 (stack.peek()).parse_state + 804 ", lhs sym " + lhs_sym_num + " -> state " + act); 805 806 /* shift to that state */ 807 lhs_sym.parse_state = act; 808 lhs_sym.used_by_parser = true; 809 stack.push(lhs_sym); 810 tos++; 811 812 debug_message("# Goto state #" + act); 813 } 814 /* finally if the entry is zero, we have an error */ 815 else if (act == 0) 816 { 817 /* call user syntax error reporting routine */ 818 syntax_error(cur_token); 819 820 /* try to error recover */ 821 if (!error_recovery(true)) 822 { 823 /* if that fails give up with a fatal syntax error */ 824 unrecovered_syntax_error(cur_token); 825 826 /* just in case that wasn't fatal enough, end parse */ 827 done_parsing(); 828 } else { 829 lhs_sym = stack.peek(); 830 } 831 } 832 } 833 return lhs_sym; 834 } 835 836 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 837 /* Error recovery code */ 838 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 839 840 /** Attempt to recover from a syntax error. This returns false if recovery 841 * fails, true if it succeeds. Recovery happens in 4 steps. First we 842 * pop the parse stack down to a point at which we have a shift out 843 * of the top-most state on the error Symbol. This represents the 844 * initial error recovery configuration. If no such configuration is 845 * found, then we fail. Next a small number of "lookahead" or "parse 846 * ahead" Symbols are read into a buffer. The size of this buffer is 847 * determined by error_sync_size() and determines how many Symbols beyond 848 * the error must be matched to consider the recovery a success. Next, 849 * we begin to discard Symbols in attempt to get past the point of error 850 * to a point where we can continue parsing. After each Symbol, we attempt 851 * to "parse ahead" though the buffered lookahead Symbols. The "parse ahead" 852 * process simulates that actual parse, but does not modify the real 853 * parser's configuration, nor execute any actions. If we can parse all 854 * the stored Symbols without error, then the recovery is considered a 855 * success. Once a successful recovery point is determined, we do an 856 * actual parse over the stored input -- modifying the real parse 857 * configuration and executing all actions. Finally, we return the the 858 * normal parser to continue with the overall parse. 859 * 860 * @param debug should we produce debugging messages as we parse. 861 */ 862 protected boolean error_recovery(boolean debug) 863 throws java.lang.Exception 864 { 865 if (debug) debug_message("# Attempting error recovery"); 866 867 /* first pop the stack back into a state that can shift on error and 868 do that shift (if that fails, we fail) */ 869 if (!find_recovery_config(debug)) 870 { 871 if (debug) debug_message("# Error recovery fails"); 872 return false; 873 } 874 875 /* read ahead to create lookahead we can parse multiple times */ 876 read_lookahead(); 877 878 /* repeatedly try to parse forward until we make it the required dist */ 879 for (;;) 880 { 881 /* try to parse forward, if it makes it, bail out of loop */ 882 if (debug) debug_message("# Trying to parse ahead"); 883 if (try_parse_ahead(debug)) 884 { 885 break; 886 } 887 888 /* if we are now at EOF, we have failed */ 889 if (lookahead[0].sym == EOF_sym()) 890 { 891 if (debug) debug_message("# Error recovery fails at EOF"); 892 return false; 893 } 894 895 /* otherwise, we consume another Symbol and try again */ 896 if (debug) 897 debug_message("# Consuming Symbol #" + cur_err_token().sym); 898 restart_lookahead(); 899 } 900 901 /* we have consumed to a point where we can parse forward */ 902 if (debug) debug_message("# Parse-ahead ok, going back to normal parse"); 903 904 /* do the real parse (including actions) across the lookahead */ 905 parse_lookahead(debug); 906 907 /* we have success */ 908 return true; 909 } 910 911 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 912 913 /** Determine if we can shift under the special error Symbol out of the 914 * state currently on the top of the (real) parse stack. 915 */ 916 protected boolean shift_under_error() 917 { 918 /* is there a shift under error Symbol */ 919 return get_action((stack.peek()).parse_state, error_sym()) > 0; 920 } 921 922 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 923 924 /** Put the (real) parse stack into error recovery configuration by 925 * popping the stack down to a state that can shift on the special 926 * error Symbol, then doing the shift. If no suitable state exists on 927 * the stack we return false 928 * 929 * @param debug should we produce debugging messages as we parse. 930 */ 931 protected boolean find_recovery_config(boolean debug) 932 { 933 Symbol error_token; 934 int act; 935 936 if (debug) debug_message("# Finding recovery state on stack"); 937 938 /* Remember the right-position of the top symbol on the stack */ 939 int right_pos = (stack.peek()).right; 940 int left_pos = (stack.peek()).left; 941 942 /* pop down until we can shift under error Symbol */ 943 while (!shift_under_error()) 944 { 945 /* pop the stack */ 946 if (debug) 947 debug_message("# Pop stack by one, state was # " + 948 (stack.peek()).parse_state); 949 left_pos = ((Symbol)stack.pop()).left; 950 tos--; 951 952 /* if we have hit bottom, we fail */ 953 if (stack.empty()) 954 { 955 if (debug) debug_message("# No recovery state found on stack"); 956 return false; 957 } 958 } 959 960 /* state on top of the stack can shift under error, find the shift */ 961 act = get_action((stack.peek()).parse_state, error_sym()); 962 if (debug) 963 { 964 debug_message("# Recover state found (#" + 965 (stack.peek()).parse_state + ")"); 966 debug_message("# Shifting on error to state #" + (act-1)); 967 } 968 969 /* build and shift a special error Symbol */ 970 error_token = new Symbol(error_sym(), left_pos, right_pos); 971 error_token.parse_state = act-1; 972 error_token.used_by_parser = true; 973 stack.push(error_token); 974 tos++; 975 976 return true; 977 } 978 979 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 980 981 /** Lookahead Symbols used for attempting error recovery "parse aheads". */ 982 protected Symbol lookahead[]; 983 984 /** Position in lookahead input buffer used for "parse ahead". */ 985 protected int lookahead_pos; 986 987 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 988 989 /** Read from input to establish our buffer of "parse ahead" lookahead 990 * Symbols. 991 */ 992 protected void read_lookahead() throws java.lang.Exception 993 { 994 /* create the lookahead array */ 995 lookahead = new Symbol[error_sync_size()]; 996 997 /* fill in the array */ 998 for (int i = 0; i < error_sync_size(); i++) 999 { 1000 lookahead[i] = cur_token; 1001 cur_token = scan(); 1002 } 1003 1004 /* start at the beginning */ 1005 lookahead_pos = 0; 1006 } 1007 1008 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 1009 1010 /** Return the current lookahead in our error "parse ahead" buffer. */ 1011 protected Symbol cur_err_token() { return lookahead[lookahead_pos]; } 1012 1013 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 1014 1015 /** Advance to next "parse ahead" input Symbol. Return true if we have 1016 * input to advance to, false otherwise. 1017 */ 1018 protected boolean advance_lookahead() 1019 { 1020 /* advance the input location */ 1021 lookahead_pos++; 1022 1023 /* return true if we didn't go off the end */ 1024 return lookahead_pos < error_sync_size(); 1025 } 1026 1027 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 1028 1029 /** Reset the parse ahead input to one Symbol past where we started error 1030 * recovery (this consumes one new Symbol from the real input). 1031 */ 1032 protected void restart_lookahead() throws java.lang.Exception 1033 { 1034 /* move all the existing input over */ 1035 for (int i = 1; i < error_sync_size(); i++) 1036 lookahead[i-1] = lookahead[i]; 1037 1038 /* read a new Symbol into the last spot */ 1039 cur_token = scan(); 1040 lookahead[error_sync_size()-1] = cur_token; 1041 1042 /* reset our internal position marker */ 1043 lookahead_pos = 0; 1044 } 1045 1046 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 1047 1048 /** Do a simulated parse forward (a "parse ahead") from the current 1049 * stack configuration using stored lookahead input and a virtual parse 1050 * stack. Return true if we make it all the way through the stored 1051 * lookahead input without error. This basically simulates the action of 1052 * parse() using only our saved "parse ahead" input, and not executing any 1053 * actions. 1054 * 1055 * @param debug should we produce debugging messages as we parse. 1056 */ 1057 protected boolean try_parse_ahead(boolean debug) 1058 throws java.lang.Exception 1059 { 1060 int act; 1061 short lhs, rhs_size; 1062 1063 /* create a virtual stack from the real parse stack */ 1064 virtual_parse_stack vstack = new virtual_parse_stack(stack); 1065 1066 /* parse until we fail or get past the lookahead input */ 1067 for (;;) 1068 { 1069 /* look up the action from the current state (on top of stack) */ 1070 act = get_action(vstack.top(), cur_err_token().sym); 1071 1072 /* if its an error, we fail */ 1073 if (act == 0) return false; 1074 1075 /* > 0 encodes a shift */ 1076 if (act > 0) 1077 { 1078 /* push the new state on the stack */ 1079 vstack.push(act-1); 1080 1081 if (debug) debug_message("# Parse-ahead shifts Symbol #" + 1082 cur_err_token().sym + " into state #" + (act-1)); 1083 1084 /* advance simulated input, if we run off the end, we are done */ 1085 if (!advance_lookahead()) return true; 1086 } 1087 /* < 0 encodes a reduce */ 1088 else 1089 { 1090 /* if this is a reduce with the start production we are done */ 1091 if ((-act)-1 == start_production()) 1092 { 1093 if (debug) debug_message("# Parse-ahead accepts"); 1094 return true; 1095 } 1096 1097 /* get the lhs Symbol and the rhs size */ 1098 lhs = production_tab[(-act)-1][0]; 1099 rhs_size = production_tab[(-act)-1][1]; 1100 1101 /* pop handle off the stack */ 1102 for (int i = 0; i < rhs_size; i++) 1103 vstack.pop(); 1104 1105 if (debug) 1106 debug_message("# Parse-ahead reduces: handle size = " + 1107 rhs_size + " lhs = #" + lhs + " from state #" + vstack.top()); 1108 1109 /* look up goto and push it onto the stack */ 1110 vstack.push(get_reduce(vstack.top(), lhs)); 1111 if (debug) 1112 debug_message("# Goto state #" + vstack.top()); 1113 } 1114 } 1115 } 1116 1117 /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ 1118 1119 /** Parse forward using stored lookahead Symbols. In this case we have 1120 * already verified that parsing will make it through the stored lookahead 1121 * Symbols and we are now getting back to the point at which we can hand 1122 * control back to the normal parser. Consequently, this version of the 1123 * parser performs all actions and modifies the real parse configuration. 1124 * This returns once we have consumed all the stored input or we accept. 1125 * 1126 * @param debug should we produce debugging messages as we parse. 1127 */ 1128 protected void parse_lookahead(boolean debug) 1129 throws java.lang.Exception 1130 { 1131 /* the current action code */ 1132 int act; 1133 1134 /* the Symbol/stack element returned by a reduce */ 1135 Symbol lhs_sym = null; 1136 1137 /* information about production being reduced with */ 1138 short handle_size, lhs_sym_num; 1139 1140 /* restart the saved input at the beginning */ 1141 lookahead_pos = 0; 1142 1143 if (debug) 1144 { 1145 debug_message("# Reparsing saved input with actions"); 1146 debug_message("# Current Symbol is #" + cur_err_token().sym); 1147 debug_message("# Current state is #" + 1148 (stack.peek()).parse_state); 1149 } 1150 1151 /* continue until we accept or have read all lookahead input */ 1152 while(!_done_parsing) 1153 { 1154 /* current state is always on the top of the stack */ 1155 1156 /* look up action out of the current state with the current input */ 1157 act = 1158 get_action((stack.peek()).parse_state, cur_err_token().sym); 1159 1160 /* decode the action -- > 0 encodes shift */ 1161 if (act > 0) 1162 { 1163 /* shift to the encoded state by pushing it on the stack */ 1164 cur_err_token().parse_state = act-1; 1165 cur_err_token().used_by_parser = true; 1166 if (debug) debug_shift(cur_err_token()); 1167 stack.push(cur_err_token()); 1168 tos++; 1169 1170 /* advance to the next Symbol, if there is none, we are done */ 1171 if (!advance_lookahead()) 1172 { 1173 if (debug) debug_message("# Completed reparse"); 1174 1175 /* scan next Symbol so we can continue parse */ 1176 // BUGFIX by Chris Harris <ckharris@ucsd.edu>: 1177 // correct a one-off error by commenting out 1178 // this next line. 1179 /*cur_token = scan();*/ 1180 1181 /* go back to normal parser */ 1182 return; 1183 } 1184 1185 if (debug) 1186 debug_message("# Current Symbol is #" + cur_err_token().sym); 1187 } 1188 /* if its less than zero, then it encodes a reduce action */ 1189 else if (act < 0) 1190 { 1191 /* perform the action for the reduce */ 1192 lhs_sym = do_action((-act)-1, this, stack, tos); 1193 1194 /* look up information about the production */ 1195 lhs_sym_num = production_tab[(-act)-1][0]; 1196 handle_size = production_tab[(-act)-1][1]; 1197 1198 if (debug) debug_reduce((-act)-1, lhs_sym_num, handle_size); 1199 1200 /* pop the handle off the stack */ 1201 for (int i = 0; i < handle_size; i++) 1202 { 1203 stack.pop(); 1204 tos--; 1205 } 1206 1207 /* look up the state to go to from the one popped back to */ 1208 act = get_reduce((stack.peek()).parse_state, lhs_sym_num); 1209 1210 /* shift to that state */ 1211 lhs_sym.parse_state = act; 1212 lhs_sym.used_by_parser = true; 1213 stack.push(lhs_sym); 1214 tos++; 1215 1216 if (debug) debug_message("# Goto state #" + act); 1217 1218 } 1219 /* finally if the entry is zero, we have an error 1220 (shouldn't happen here, but...)*/ 1221 else if (act == 0) 1222 { 1223 report_fatal_error("Syntax error", lhs_sym); 1224 return; 1225 } 1226 } 1227 1228 1229 } 1230 1231 /*-----------------------------------------------------------*/ 1232 1233 /** Utility function: unpacks parse tables from strings */ 1234 protected static short[][] unpackFromStrings(String[] sa) 1235 { 1236 // Concatanate initialization strings. 1237 StringBuilder sb = new StringBuilder(sa[0]); 1238 for (int i=1; i<sa.length; i++) 1239 sb.append(sa[i]); 1240 int n=0; // location in initialization string 1241 int size1 = (((int)sb.charAt(n))<<16) | ((int)sb.charAt(n+1)); n+=2; 1242 short[][] result = new short[size1][]; 1243 for (int i=0; i<size1; i++) { 1244 int size2 = (((int)sb.charAt(n))<<16) | ((int)sb.charAt(n+1)); n+=2; 1245 result[i] = new short[size2]; 1246 for (int j=0; j<size2; j++) 1247 result[i][j] = (short) (sb.charAt(n++)-2); 1248 } 1249 return result; 1250 } 1251 }