Extra|Delete Duplicate Lines

Select the Delete Duplicate Lines item in the Extra menu to delete lines with (nearly) identical text on them. The status bar will indicate how many lines were deleted. You can make a number of choices as to what EditPad Pro will consider a duplicate line.

Scope

If you've selected part of the file before using the Delete Duplicate Lines command, you can limit the command to delete only lines that are selected. If the first and/or last line in the selection are only partially selected, the selection will be expanded to include them entirely. If the selection is rectangular, lines covered by the selection will be deleted entirely.

Proximity of Duplicate Lines

Select "anywhere in the scope" to delete all lines that are duplicated anywhere. The first copy of the line will remain, while all the others will be deleted. If you've set the scope to "selected lines", the lines must be duplicated inside the selection. Lines that are not duplicated inside the selection will not be deleted, even if they have lines outside the selection.

Select "adjacent lines only" if you only want to delete a line's duplicates if they're immediately below the line they duplicate, without any other lines between them. If the file's lines are sorted alphabetically, then the end result of "anywhere in the scope" and "adjacent lines only" will be the same. In a sorted file, all duplicates are sorted together. However, selecting "adjacent lines only" will delete the duplicate lines significantly faster, certainly when the number of lines in the file is large. If you select "anywhere in the scope", EditPad Pro has to compare each line with every other line in the file.

Comparison Options

By turning on one or more comparison options, you can tell EditPad Pro to consider lines as duplicates even when they aren't identical.

The "compare selected columns only" is only available when you've made a selection that does not span more than one line, or when you've made a rectangular selection. With this option, EditPad Pro will only compare the selected columns. E.g. if the selection spans from column 10 to column 20, EditPad Pro will compare columns 10 through 20 of each line. If a line has less than 10 characters it will be considered blank. This has important consequences (see next section).

"Ignore differences in leading spaces and tabs" will treat lines that only differ in the number of spaces and tabs at the start of the line as duplicates. Similarly, "ignore trailing spaces and tabs" ignores differences in spaces and tabs at the end of each line. "Ignore all differences in spaces and tab" is more than a combination of the two previous options. EditPad Pro will then completely ignore all spaces and tabs, including spaces and tabs in the middle of lines.

"Ignore difference in case" compares lines without regard to the difference between upper case and lower case letters.

Lines to Delete

You must select one or two choices in the "lines to delete" section. Every line in the file belongs to one of the 3 categories. Selecting none of the options would have no effect, and selecting all of them would delete all the lines in the file.

Turn on "2nd and following occurrences of duplicate lines" and turn off the other two options to delete all duplicate lines, leaving only unique files in the file, regardless of whether they were previously unique. Use this to delete unnecessary duplicates from a file.

Turn on both "2nd..." and "1st occurrence of duplicate lines" to delete all duplicate lines, leaving only lines that were previously unique.

Turn on both "2nd..." and "non-duplicated lines" to leave only one copy of all lines that had duplicates. If you paste the contents of two lists that consist of unique lines (when viewed separately) into a file in EditPad, then you can use this combination to get the lines that occurred in both files, but not the lines that occurred in only one of the files.

If you want to keep only lines that occur a certain number of times, use the Delete Duplicate Lines several times. E.g. if you only want lines that occur 3 times or more, use it twice with the "1st occurrence..." and "non-duplicated..." options turned on. Then use it again with the "2nd occurrence..." and "non-duplicated..." options. The first time you delete the lines that occur only once, the second time you delete lines that occur only twice, and the third time you delete the duplicates of lines that occur four times or more.

Blank Lines

Since blank lines are technically all duplicates of each other, EditPad Pro offers you an extra choice for blank lines. You can choose to either delete all blank lines, not to delete any blank lines, or to only delete duplicate blank lines. The "duplicate blank lines" option takes into account the "proximity" setting, deleting either all but the first blank lines ("anywhere in the scope"), or only replacing subsequent blank lines with a single blank line ("adjacent lines only").

If you've turned on the "compare selected columns only" option, a line may be considered blank even when it isn't. If a line is shorter than the leftmost column in the selection, it is considered to be blank, even if it does have text on it.

Lines with only spaces and tabs on them are only considered to be blank if you've turned one of the options to ignore differences in spaces and tabs. On a line with only spaces and tabs, all spaces and tabs are considered to be both leading and trailing at the same time.

See Also

Extra menu
Extra|Delete Blank Lines
Extra|Consolidate Blank Lines