de.matthias_burbach.util
Class Diff
java.lang.Object
de.matthias_burbach.util.Diff
- public class Diff
- extends java.lang.Object
Diff - A Text file difference utility.
Copyright 1987, 1989 by Donald C. Lindsay,
School of Computer Science, Carnegie Mellon University.
Copyright 1982 by Symbionics.
Use without fee is permitted when not for direct commercial
advantage, and when credit to the source is given. Other uses
require specific permission.
Converted from C to Java by Ian F. Darwin, ian@darwinsys.com, January, 1997.
Copyright 1997, Ian F. Darwin.
Adapted to specific personal needs by Matthias Burbach
- replaced deprecated DataInputStream usage by BufferedReader
- added the field 'printStream' and its usage to configure where the output
goes to
- replaced System.exit calls by exceptions to avoid VM termination when
called in an embedded context)
- renamed classes to start with an upper case letter
- pushed helper class 'FileInfo' as inner class into the Diff class
Conversion is NOT FULLY TESTED.
Usage: Diff oldfile newfile
This program assumes that "oldfile" and "newfile" are text files.
The program writes to stdout a description of the changes which would
transform "oldfile" into "newfile".
The printout is in the form of commands, each followed by a block of
text. The text is delimited by the commands, which are:
- DELETE AT n
..deleted lines
-
INSERT BEFORE n
..inserted lines
-
n MOVED TO BEFORE n
..moved lines
-
n CHANGED FROM
..old lines
-
CHANGED TO
..newer lines
The line numbers all refer to the lines of the oldfile, as they are
numbered before any commands are applied.
The text lines are printed as-is, without indentation or prefixing. The
commands are printed in upper case, with a prefix of ">>>>", so that
they will stand out. Other schemes may be preferred.
Files which contain more than MAXLINECOUNT lines cannot be processed.
This can be fixed by changing "symbol" to a Vector.
The algorithm is taken from Communications of the ACM, Apr78 (21, 4, 264-),
"A Technique for Isolating Differences Between Files."
Ignoring I/O, and ignoring the symbol table, it should take O(N) time.
This implementation takes fixed space, plus O(U) space for the symbol
table (where U is the number of unique lines). Methods exist to change
the fixed space to O(N) space.
Note that this is not the only interesting file-difference algorithm. In
general, different algorithms draw different conclusions about the
changes that have been made to the oldfile. This algorithm is sometimes
"more right", particularly since it does not consider a block move to be
an insertion and a (separate) deletion. However, on some files it will be
"less right". This is a consequence of the fact that files may contain
many identical lines (particularly if they are program source). Each
algorithm resolves the ambiguity in its own way, and the resolution
is never guaranteed to be "right". However, it is often excellent.
This program is intended to be pedagogic. Specifically, this program was
the basis of the Literate Programming column which appeared in the
Communications of the ACM (CACM), in the June 1989 issue (32, 6,
740-755).
By "pedagogic", I do not mean that the program is gracefully worded, or
that it showcases language features or its algorithm. I also do not mean
that it is highly accessible to beginners, or that it is intended to be
read in full, or in a particular order. Rather, this program is an
example of one professional's style of keeping things organized and
maintainable.
The program would be better if the "print" variables were wrapped into
a struct. In general, grouping related variables in this way improves
documentation, and adds the ability to pass the group in argument lists.
This program is a de-engineered version of a program which uses less
memory and less time. The article points out that the "symbol" arrays
can be implemented as arrays of pointers to arrays, with dynamic
allocation of the subarrays. (In C, macros are very useful for hiding
the two-level accesses.) In Java, a Vector would be used. This allows an
extremely large value for MAXLINECOUNT, without dedicating fixed arrays.
(The "other" array can be allocated after the input phase, when the exact
sizes are known.) The only slow piece of code is the "strcmp" in the tree
descent: it can be speeded up by keeping a hash in the tree node, and
only using "strcmp" when two hashes happen to be equal.
Change Log
----------
1Jan97 Ian F. Darwin: first working rewrite in Java, based entirely on
D.C.Lindsay's reasonable C version.
Changed comments from /***************** to /**, shortened, added
whitespace, used tabs more, etc.
6jul89 D.C.Lindsay, CMU: fixed portability bug. Thanks, Gregg Wonderly.
Just changed "char ch" to "int ch".
Also added comment about way to improve code.
10jun89 D.C.Lindsay, CMU: posted version created.
Copyright notice changed to ACM style, and Dept. is now School.
ACM article referenced in docn.
26sep87 D.C.Lindsay, CMU: publication version created.
Condensed all 1982/83 change log entries.
Removed all command line options, and supporting code. This
simplified the input code (no case reduction etc). It also
simplified the symbol table, which was capable of remembering
offsets into files (instead of strings), and trusting (!) hash
values to be unique.
Removed dynamic allocation of arrays: now fixed static arrays.
Removed speed optimizations in symtab package.
Removed string compression/decompression code.
Recoded to Unix standards from old Lattice/MSDOS standards.
(This affected only the #include's and the IO.)
Some renaming of variables, and rewording of comments.
1982/83 D.C.Lindsay, Symbionics: created.
- Version:
- Java version 0.9, 1997
- Author:
- Ian F. Darwin, Java version, D. C. Lindsay, C version (1982-1987)
Nested Class Summary |
(package private) class |
Diff.FileInfo
This is the info kept per file. |
Constructor Summary |
Diff()
Construct a Diff object. |
Diff(java.io.PrintWriter printWriter)
|
Method Summary |
void |
doDiff(java.lang.String oldFile,
java.lang.String newFile)
Do one file comparison. |
(package private) void |
inputscan(Diff.FileInfo pinfo)
inputscan Reads the file specified by pinfo.file.
--------- Places the lines of that file in the symbol table.
|
static void |
main(java.lang.String[] argstrings)
main - entry point when used standalone.
|
(package private) void |
newconsume()
|
(package private) void |
oldconsume()
oldconsume Part of printout. |
void |
println(java.lang.String s)
Convenience wrapper for println |
(package private) void |
printout()
printout - Prints summary to stdout.
|
(package private) void |
scanafter()
|
(package private) void |
scanbefore()
scanbefore
As scanafter, except scans towards file fronts.
|
(package private) void |
scanblocks()
scanblocks - Finds the beginnings and lengths of blocks of matches.
|
(package private) void |
scanunique()
|
(package private) void |
showchange()
showchange Part of printout.
|
(package private) void |
showdelete()
showdelete Part of printout.
|
(package private) void |
showinsert()
|
(package private) void |
showmove()
showmove Part of printout.
|
(package private) void |
showsame()
showsame Part of printout.
|
(package private) void |
skipnew()
skipnew Part of printout.
|
(package private) void |
skipold()
skipold Part of printout.
|
(package private) void |
storeline(java.lang.String linebuffer,
Diff.FileInfo pinfo)
storeline Places line into symbol table.
--------- Expects pinfo.maxLine initted: increments.
|
(package private) void |
transform()
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
UNREAL
final int UNREAL
- block len > any possible real block len
- See Also:
- Constant Field Values
oldinfo
Diff.FileInfo oldinfo
- Keeps track of information about file1 and file2
newinfo
Diff.FileInfo newinfo
- Keeps track of information about file1 and file2
blocklen
int[] blocklen
- blocklen is the info about found blocks. It will be set to 0, except
at the line#s where blocks start in the old file. At these places it
will be set to the # of lines in the block. During printout ,
this # will be reset to -1 if the block is printed as a MOVE block
(because the printout phase will encounter the block twice, but
must only print it once.)
The array declarations are to MAXLINECOUNT+2 so that we can have two
extra lines (pseudolines) at line# 0 and line# MAXLINECOUNT+1
(or less).
printWriter
java.io.PrintWriter printWriter
idle
public static final int idle
- See Also:
- Constant Field Values
delete
public static final int delete
- See Also:
- Constant Field Values
insert
public static final int insert
- See Also:
- Constant Field Values
movenew
public static final int movenew
- See Also:
- Constant Field Values
moveold
public static final int moveold
- See Also:
- Constant Field Values
same
public static final int same
- See Also:
- Constant Field Values
change
public static final int change
- See Also:
- Constant Field Values
printstatus
int printstatus
anyprinted
boolean anyprinted
printoldline
int printoldline
printnewline
int printnewline
Diff
public Diff()
- Construct a Diff object.
Diff
public Diff(java.io.PrintWriter printWriter)
main
public static void main(java.lang.String[] argstrings)
- main - entry point when used standalone.
NOTE: no routines return error codes or throw any local
exceptions. Instead, any routine may complain
to stderr and then exit with error to the system.
doDiff
public void doDiff(java.lang.String oldFile,
java.lang.String newFile)
throws java.lang.Exception
- Do one file comparison. Called with both filenames.
- Throws:
java.lang.Exception
inputscan
void inputscan(Diff.FileInfo pinfo)
throws java.lang.Exception
- inputscan Reads the file specified by pinfo.file.
--------- Places the lines of that file in the symbol table.
Sets pinfo.maxLine to the number of lines found.
- Throws:
java.lang.Exception
storeline
void storeline(java.lang.String linebuffer,
Diff.FileInfo pinfo)
throws java.lang.Exception
- storeline Places line into symbol table.
--------- Expects pinfo.maxLine initted: increments.
Places symbol table handle in pinfo.ymbol.
Expects pinfo is either oldinfo or newinfo.
- Throws:
java.lang.Exception
transform
void transform()
scanunique
void scanunique()
scanafter
void scanafter()
scanbefore
void scanbefore()
- scanbefore
As scanafter, except scans towards file fronts.
Assumes the off-end lines have been marked as a match.
scanblocks
void scanblocks()
- scanblocks - Finds the beginnings and lengths of blocks of matches.
Sets the blocklen array (see definition).
Expects oldinfo valid.
printout
void printout()
throws java.lang.Exception
- printout - Prints summary to stdout.
Expects all data structures have been filled out.
- Throws:
java.lang.Exception
newconsume
void newconsume()
oldconsume
void oldconsume()
- oldconsume Part of printout. Have run out of new file.
Process the rest of the old file, printing any
parts which were deletes or moves.
showdelete
void showdelete()
- showdelete Part of printout.
Expects printoldline is at a deletion.
showinsert
void showinsert()
showchange
void showchange()
- showchange Part of printout.
Expects printnewline is an insertion.
Expects printoldline is a deletion.
skipold
void skipold()
- skipold Part of printout.
Expects printoldline at start of an old block that has
already been announced as a move.
Skips over the old block.
skipnew
void skipnew()
- skipnew Part of printout.
Expects printnewline is at start of a new block that has
already been announced as a move.
Skips over the new block.
showsame
void showsame()
throws java.lang.Exception
- showsame Part of printout.
Expects printnewline and printoldline at start of
two blocks that aren't to be displayed.
- Throws:
java.lang.Exception
showmove
void showmove()
- showmove Part of printout.
Expects printoldline, printnewline at start of
two different blocks ( a move was done).
println
public void println(java.lang.String s)
- Convenience wrapper for println