所以,我正在尝试解析一些包含多行文本的文本文件.我的工作是浏览所有单词并将它们打印在文件中.
So, I'm trying to parse some text file which has multiple lines of text. My job is to go through all words and print them out in file.
所以,我阅读了所有的行,我正在循环它们并用空格分隔每一行,如下所示:
So, I read all lines, I'm looping through them and splitting every line by spaces, like this:
line.split("\s+");
现在,问题是在某些情况下 Java 看不到两个单词之间的空格...
Now, the problem is that in some cases Java does not see space between two words...
我也试图遍历有空格但 Java 看不到它的字符串,并且 Character.isSpaceChar(char)
返回 true...
I was also trying to loop through string which has space but Java doesn't see it, and Character.isSpaceChar(char)
returned true...
现在我完全糊涂了……
代码如下:
public void createMap(String inputPath, String outputPath)
throws IOException {
File f = new File(inputPath);
FileWriter fw = new FileWriter(outputPath);
List<String> lines = Files.readAllLines(f.toPath(),
StandardCharsets.UTF_8);
for (String l : lines) {
for (String w : l.split("\s+")) {
if (isNotRubbish(w.trim())) {
fw.write(w.trim() + "
");
}
}
}
fw.close();
}
private boolean isNotRubbish(String w) {
Pattern p = Pattern.compile("@?\p{L}+",
Pattern.UNICODE_CHARACTER_CLASS);
Matcher m = p.matcher(w);
return m.matches();
}
我怀疑你的文本字符中有类似于 non-breakable-space 不是空白,因此无法通过 \s
进行匹配.
I suspect that you have in your text character which is similar to non-breakable-space which is not white space so it can't be matched via \s
.
在这种情况下,请尝试使用 p{Zs}
而不是 s
.
In that case try to use p{Zs}
instead of s
.
如 http://www.regular-expressions.info/unicode.html 中所述一个>
p{Zs}
将匹配任何类型的空格字符
p{Zs}
will match any kind of space character
顺便说一句,如果您还想包含除空格之外的其他分隔符,例如制表符
或换行符
您可以组合p{Zs}
与 s
类似 [p{Zs}s]
BTW if you would also like to include other separators than spaces like tabulators
or line breaks
you can combine p{Zs}
with s
like [p{Zs}s]
这篇关于Java 在字符串中看不到空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!