Name: findstructapi Version: 0.0.1 Release: 1%{?dist} Summary: Java library for structured the clinical documents # ASL 2.0 pom file # MIT project license see FindStruct_doc.html License: ASL 2.0 and MIT URL: http://groups.csail.mit.edu/medg/projects/text/findstruct/ #Source0: http://groups.csail.mit.edu/medg/projects/text/findstruct/FindStruct_multiple.zip Source0: http://repo1.maven.org/maven2/edu/mit/findstruct/%{name}/%{version}/%{name}-%{version}-sources.jar Source1: http://repo1.maven.org/maven2/edu/mit/findstruct/%{name}/%{version}/%{name}-%{version}.pom Patch0: findstructapi-0.0.1-remove-netbeans-awtextra.patch BuildRequires: ant BuildRequires: javapackages-local BuildRequires: jdom BuildArch: noarch %description Many clinical documents are not well structured in the sense of being based on a formal markup language, but nevertheless have significant clues within their text that allow a heuristic computer program to recover much of their structure and thereby to create a version with the formal markup included. The FindStruct program is designed to do just this task. This program, and others like it, work best when the clinical documents are created using templates that indicate the document structure as embedded labels, headings, etc., or when documents have been generated by computer programs that were designed to support readability by human readers, but not by other computer programs. Fortunately, we have encountered a significant number of such sources of documents, and the current program is the result of generalizing methods from a handful of these. FindStruct takes an XML file that specifies the clues to be sought in a text document to mark the beginnings (and occasionally the patterns) of text sections and subsections. It then processes a set of text documents, searching for the specified structures in each, and writes an XML file with the same content as the original document but with explicit markup having been inserted to indicate the structures that have been found. Optionally, it also recognizes numbered lists of items and outputs XML tags to indicate these. FindStructAPI is used by Apache cTAKES. It was originally developed out of mit.edu's NLP group. %package javadoc Summary: Javadoc for %{name} %description javadoc This package contains javadoc for %{name}. %prep %setup -q -c # Cleanup rm -r __MACOSX find . -name '*.class' -print -delete find . -name '*.jar' -print -delete rm -r FindStruct/redist/* FindStruct/build/* %patch0 -p1 # fix non ASCII chars for s in FindStruct/src/findstruct/DParagraph.java;do native2ascii -encoding UTF8 ${s} ${s} done # Customize pom file cp -p %SOURCE1 pom.xml %pom_add_dep org.jdom:jdom:1.1.3 chmod 644 FindStruct/FindStruct_doc.html sed -i 's/\r//' FindStruct/FindStruct_doc.html %mvn_file edu.mit.findstruct:findstructapi %{name} findstructapi FindStruct %build ant -f FindStruct/build.xml \ -Djavac.source=1.6 \ -Djavac.target=1.6 \ -Djavadoc.additionalparam="-Xdoclint:none" \ -Djavadoc.windowtitle="FindStruct API" \ -Dfile.reference.jdom.jar=$(build-classpath jdom) %install %mvn_artifact pom.xml FindStruct/dist/FindStruct.jar %mvn_install -J FindStruct/dist/javadoc %files -f .mfiles %doc FindStruct/FindStruct_doc.html %files javadoc -f .mfiles-javadoc %changelog * Sat Sep 12 2015 gil cattaneo 0.0.1-1 - initial rpm