Need a little help with Notepad++

yurihellsing · May 22, 2013

a little back story first : I'm using a script to translate tags on a Japanese version of Deviant art and it has the option to batch add tag translations but setting up the file can take some time and effort.

So I'm wanting to speed this process up by using the character lists found on wikipedia which has the lists the names both in English and Japanese. The method i want to use is to copy the HTML into Notepad ++ and isolating them using some regular expression I've gotten this(clicky) far but there doesn't seem to let me copy the marked code i have tried everything so would anyone have an idea either A what i mean, B how to do this, or C a better way to achieve what i want which is a copyable list of the names(see ex 1)

ex 1eng | JPeng | JPeng | JP

Thanks and feel free to as for a rephrase as it's 7am and i've been up allnight on this.

alpenwasser · May 22, 2013

I don't really know Notepad++, but seeing as regular expressions are more or less

universal (with some things specific to each implementation) I might be able to

help out anyway.

To make sure I understand this correctly: The regex operation you want to do is

substitute something like this:

<span class="t_nihongo_kanji" lang="ja" xml:lang="ja">[...Japanese characters...]</span><span class="t_nihongo_comma" style="display:none">,</span> <i>[...English translation...]</i>

with

[...Japanese characters...] | [...English translation...]

?

So I would suggest running a search and replace on your text, specifically:

Find:

[...your regular expression from screenshot...]

Replace:

\2 | \1

Or something similar.

EDIT: Sorry, had to drive my neighbour to the emergency room, he's recently

had knee surgery and it's acting up. Anyway, one more thing I wanted to add: I'm

not sure which characters need to be escaped in Notepad's regex engine (the |, for

example), if you're getting errors or unexpected results that is definitely something

I would look into as well.

yurihellsing · May 22, 2013

Yeah i know how to find the string i want just that i would like to copy it like you can in word when you search for a word it highlights it ready for copying or cutting but notepad++ only lets you copy/cut the line the string you're looking for is in.

alpenwasser · May 22, 2013

Hm. What exactly happens when you do the above search and replace operation? I

currently don't have access to a Windows machine so I can't try it out myself,

but if all else fails I will have one tomorrow where I could play around with it.

Could you maybe post the source code for the page you're trying to parse? That

would allow me to tinker around with it.

yurihellsing · May 22, 2013

This is the page im starting with im sorta assuming that all lists will be set out like this in HTML http://en.wikipedia.org/wiki/List_of_Angelic_Layer_characters

alpenwasser · May 22, 2013

Ok, just as an example: What you're looking for is a new list looking something like

this:

Suzuhara Misaki | 鈴原 みさきMihara Ichirō | 三原 一郎 Kobayashi Hatoko | 小林 鳩子etc.

(I know it would look better like this:

Suzuhara Misaki  | 鈴原 みさきMihara Ichirō    | 三原 一郎 Kobayashi Hatoko | 小林 鳩子etc.

but that will be very tricky I suspect.)

yurihellsing · May 22, 2013

Yes that is what im after.

alpenwasser · May 22, 2013

Ok I'll give it a shot tomorrow.

yurihellsing · May 23, 2013

Thank you ever so much!!

alpenwasser · May 23, 2013

Ok, I think I got it.

Go to "Search" (Ctrl-F).

Go to the "Mark" tab.

Into the "Find what", enter this regex:

^.*?xml:lang="ja">(.*?)</span><span class="t_nihongo_comma" style="display:none">,</span> <i>(.*?)</i>.*?$

Check the "Bookmark line" checkbox.

Select "Regular expression" in "Search Mode".

Click "Mark All".

A dialog box should pop up saying "X matches.".

Go to the main menu: "Search"->"Bookmark"->"Remove Unmarked Lines".

Your document should now be reduced to the lines you're interested in.

Go to "Replace" (Ctrl-H).

Into "Find what", enter the same regex as above.

EDIT: Insert the following into "Replace with":

\2 | \1

/EDIT

Select "Regular expression" as your search mode.

Click "Replace All".

A dialog box should pop up saying "X occurrences replaced."

When I do this with the following websites source text: [url=http://en.wikipedia.org/wiki/List_of_Angelic_Layer_characters]List of Angelic Characters

I get the following result:

Suzuhara Misaki | 鈴原 みさきMihara Ichirō | 三原 一郎Kobayashi Hatoko | 小林 鳩子Kobayashi Kōtarō | 小林 虎太郎Kizaki Tamayo | 木崎 珠代Mihara Ōjirō | 三原 王二郎Asami Shōko | 浅見 祥子Suzuhara Shūko | 鈴原 萩子Jōnouchi Sai | 城乃内 最Saitō Kaede | 斉藤 楓Seto Ringo | 瀬戸 林子Fujisaki Madoka | 藤崎 円香Fujisaki Arisu | 藤崎 有栖Ogata Masaharu | 尾形 雅治Fujimori Hiromi | 藤森 ひろみInada Yūko | 稲田 夕子Inada Shūji | 稲田 修二Shikaisha | 司会者Kyōko | 京子Kitamura Asuka | 北村 飛鳥Hikawa Yūko | 氷川 優子Yamada Tomoko | 山田 知子Shibata Maria | 柴田 まりあMisaki Ryō | 岬 了Jōnouchi Rin | 城乃内 鈴Tanaka Chitose | 田中 千歳Tsubasa Makkenjī | つばさ・マッケンジーHikaru | ヒカルSuzuka | 鈴鹿Shirahime | 白姫Buranshe | ブランシェRanga | ランガWizādo | ウィザードAtena | アテナMao | 猫

What you hadn't taken into account with your original regex were the characters from the line's beginning

to the pattern's beginning and from the pattern's end to the line's end.

I have looked into padding the list to make it look better, but I don't think you can do that with just regex

in NP++. If you want to try around a bit, this looks promising: [url=http://stackoverflow.com/questions/14878571/add-trailing-zeroes-to-line-in-notepad]link.

I hope this works for you and is what you need, otherwise feel free to ask more. I won't have access to

the Windows machine (and therefore NP++) tomorrow, but I can try again on Saturday.

yurihellsing · May 23, 2013

Ok when i hit replace it just clears everything leaving nothing do i need to put something in the "replace with" box?

alpenwasser · May 23, 2013

Ah crap, *facepalm*, sorry! Yes, of course, insert the following into the "Replace with" box:

\2 | \1

yurihellsing · May 23, 2013

and if i want to reverse the order JP/ENG and add " " around the names so it looks like "JPname"|"ENGname" ?

alpenwasser · May 23, 2013

"\1" | "\2"

The \1 holds the Japanese version, and the \2 holds the English version. When you look

at the regular expression:

^.*?xml:lang="ja">(.*?)</span><span class="t_nihongo_comma" style="display:none">,</span> <i>(.*?)</i>.*?$

Notice the (.*?) where the Japanese and English matches are. That means that anything

which matches for that part of the regex will be stored in a numbered variable that can

be used with \1, \2, \3 etc. They are numbered in the order they appear in the original

regex, and since the Japanese version comes first, that is stored in \1, and the English

version goes into \2.

Note that in many other regular expression engines, the " would need to be escaped like so:

\"\1\" | \"\2\"

But NP++ apparently doesn't need this. Just if you ever need to use regex on another program

and it doesn't work, an unescaped character is often the source of error.

yurihellsing · May 23, 2013

Thank you ever soooooooo much!!!!!!!!!!!!!!!!!!!!!!!!

alpenwasser · May 23, 2013

Happy to help. :)

yurihellsing · May 24, 2013

one last thing how would i remove the space in the JP column?

alpenwasser · May 24, 2013

You mean to get this:

Suzuhara Misaki |鈴原 みさき

instead of this:

Suzuhara Misaki | 鈴原 みさき

?

Simply remove the space from the replacement expression:

"\1" |"\2"

Spaces are translated 1:1 from the "Replace with" expression into the final result.

Or are you talking about removing the spaces from within the JP expressions, so

this:

Mihara Ōjirō | 三原王二郎

instead of this:

Mihara Ōjirō | 三原 王二郎

I think that one would be quite a bit trickier, but I could look into it tomorrow.

I'm not sure if my regex-fu is strong enough for that but I'll give it a shot if

that's the desired result.

yurihellsing · May 24, 2013

Or are you talking about removing the spaces from within the JP expressions, so
this:
Mihara Ōjirō | 三原王二郎
instead of this:
Mihara Ōjirō | 三原 王二郎
I think that one would be quite a bit trickier, but I could look into it tomorrow.
I'm not sure if my regex-fu is strong enough for that but I'll give it a shot if

that's the desired result.

this is what im after if it's easier the end file will be in a .txt

alpenwasser · May 24, 2013

Ok I'll have a look and get back to you.

yurihellsing · May 25, 2013

Thank you!

yurihellsing · May 26, 2013

Found out how to do it by this

the fixes

replace this--------------\s+(?=\w+":")--------------with this--------$1$2$3--------

yurihellsing · May 27, 2013

Okay im trying something new as i want to get the ENG from a different part of the line and i tried to copy what you did with the last one but it doesn't seem to work for me ( the site: http://en.wikipedia.org/wiki/List_of_Strike_Witches_characters
)

Original line

<dt><span id="Yoshika_Miyafuji">Yoshika Miyafuji</span> <span style="font-weight: normal">(<span class="t_nihongo_kanji" lang="ja" xml:lang="ja">宮藤 芳佳</span><span class="t_nihongo_comma" style="display:none">,</span> <i>Miyafuji Yoshika</i><span class="t_nihongo_help noprint"><sup><a href="/wiki/Help:Installing_Japanese_character_sets" title="Help:Installing Japanese character sets"><span class="t_nihongo_icon" style="color: #00e; font: bold 80% sans-serif; text-decoration: none; padding: 0 .1em;">?</span></a></sup></span>)</span></dt>

my attempt

^.*?span id="Yoshika_Miyafuji">(.*?)</span> <span style="font-weight: normal">(<span class="t_nihongo_kanji" lang="ja" xml:lang="ja">(.*?)</span>.*?$

What am i doing wrong?

alpenwasser · May 28, 2013

I am terribly sorry about not getting back sooner; my weekend turned into quite a jumbly

mess (the good kind, but very hectic and exhausting).

Anyway:

Found out how to do it by this

the fixes
replace this--------------\s+(?=\w+":")--------------with this--------$1$2$3--------

That's great!

I will take a look at your new problem later today, promise! ;)

EDIT: Just noticed this while having a quick look at it: You have a ( in your

regex. This is a special character and needs to be escaped. Try this:

^.*?span id="Yoshika_Miyafuji">(.*?)</span> <span style="font-weight: normal">\(<span class="t_nihongo_kanji" lang="ja" xml:lang="ja">(.*?)</span>.*?$

When I try this on your sample line, I get:

"Yoshika Miyafuji" | "宮藤 芳佳"

yurihellsing · May 28, 2013

Thanks again!

Sign In

Need a little help with Notepad++

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites