Monday, February 28, 2011

Java regex to match words with only Alphanumeric and Punctuation characters

The title says it:

 private static final String onlyAlphaNumericAndPunctuationRegex = "[\\p{Alnum}\\p{Punct}]*";

//returns true
"0123!\"#$%&<=~abcijkxyzABC".matches(onlyAlphaNumericAndPunctuationRegex);

//returns false
"hello test space".matches(onlyAlphaNumericAndPunctuationRegex);

//returns false
"some_àèìòù-ÀÈÌÒÙ_more".matches(onlyAlphaNumericAndPunctuationRegex);


if you need to have at least one character change the regex to:

"[\\p{Alnum}\\p{Punct}]{1,}";

Friday, February 11, 2011

org.apache.abdera.parser.ParseException: com.ctc.wstx.exc.WstxException: Illegal null byte in input stream

Got this at work a couple of weeks ago, just got around putting it out here.
We use abdera1.0 to parse XML from a restful call:

 org.apache.abdera.parser.stax.FOMParser feedParser = new org.apache.abdera.parser.stax.FOMParser();
 InputStream in =new URL("http://randomdonkeys.com/give_me_my_donkey.atom").openStream();
 Document<Feed> feedDoc = feedParser.parse(in);

The .parse call threw the WstxException. After digging around discovered that the abdera1.0 jar has issues with parsing ChunckedInputStream. I was able to get around it by wrapping the InputStream in InputStreamReader:

Document<Feed> feedDoc = feedParser.parse(new InputStreamReader(in));

Hope this helps someone at some point.