Name: findstruct Version: 0.0.1 Release: 1%{?dist} Summary: Java library for structured the clinical documents # CDDL or GPLv2 with exceptions: # ./FindStruct/src/findstruct/AbsoluteConstraints.java # ./FindStruct/src/findstruct/AbsoluteLayout.java # ASL 2.0 pom file # MIT project license see FindStruct_doc.html License: ASL 2.0 and MIT and (CDDL or GPLv2 with exceptions) URL: http://groups.csail.mit.edu/medg/projects/text/findstruct/ Source0: http://groups.csail.mit.edu/medg/projects/text/findstruct/FindStruct_multiple.zip Source1: http://central.maven.org/maven2/edu/mit/findstruct/findstructapi/%{version}/findstructapi-%{version}.pom Source2: http://www.apache.org/licenses/LICENSE-2.0.txt BuildRequires: ant BuildRequires: javapackages-local BuildRequires: jdom BuildArch: noarch %description Many clinical documents are not well structured in the sense of being based on a formal markup language, but nevertheless have significant clues within their text that allow a heuristic computer program to recover much of their structure and thereby to create a version with the formal markup included. The FindStruct program is designed to do just this task. This program, and others like it, work best when the clinical documents are created using templates that indicate the document structure as embedded labels, headings, etc., or when documents have been generated by computer programs that were designed to support readability by human readers, but not by other computer programs. Fortunately, we have encountered a significant number of such sources of documents, and the current program is the result of generalizing methods from a handful of these. FindStruct takes an XML file that specifies the clues to be sought in a text document to mark the beginnings (and occasionally the patterns) of text sections and subsections. It then processes a set of text documents, searching for the specified structures in each, and writes an XML file with the same content as the original document but with explicit markup having been inserted to indicate the structures that have been found. Optionally, it also recognizes numbered lists of items and outputs XML tags to indicate these. FindStructAPI is used by Apache cTAKES. It was originally developed out of mit.edu's NLP group. %package javadoc Summary: Javadoc for %{name} %description javadoc This package contains javadoc for %{name}. %prep %setup -q -c # Cleanup find . -name '*.bat' -print -delete find . -name '*.class' -print -delete find . -name '*.jar' -print -delete # fix non ASCII chars for s in FindStruct/src/findstruct/DParagraph.java;do native2ascii -encoding UTF8 ${s} ${s} done # Customize pom file cp -p %SOURCE1 pom.xml %pom_add_dep org.jdom:jdom:1.1.3 chmod 644 FindStruct/FindStruct_doc.html cp -p %SOURCE2 LICENSE.txt sed -i 's/\r//' LICENSE.txt FindStruct/FindStruct_doc.html %mvn_file edu.mit.findstruct:findstructapi %{name} findstructapi FindStruct %build ant -f FindStruct/build.xml \ -Djavac.source=1.6 \ -Djavac.target=1.6 \ -Djavadoc.additionalparam="-Xdoclint:none" \ -Djavadoc.windowtitle="FindStruct API" \ -Dfile.reference.jdom.jar=$(build-classpath jdom) %install %mvn_artifact pom.xml FindStruct/dist/FindStruct.jar %mvn_install -J FindStruct/dist/javadoc %files -f .mfiles %doc FindStruct/FindStruct_doc.html %license LICENSE.txt %files javadoc -f .mfiles-javadoc %license LICENSE.txt %changelog * Sat Sep 12 2015 gil cattaneo 0.0.1-1 - initial rpm