Extra|Delete Duplicate Lines

Select the Delete Duplicate Lines item in the Extra menu to delete lines with (nearly) identical text on them. The status bar indicates how many lines were deleted. You can make a number of choices as to what EditPad Pro considers to be a duplicate line.

Scope

If you’ve selected part of the file before using the Delete Duplicate Lines command, you can limit the command to delete only lines that are selected. If the first and/or last line in the selection are only partially selected, the selection is expanded to include them entirely. If the selection is rectangular then lines covered by the selection are deleted entirely.

Proximity of Duplicate Lines

Select “anywhere in the scope” to delete all lines that are duplicated anywhere. The first copy of the line remains. All the others are deleted. If you’ve set the scope to “selected lines” then the lines must be duplicated inside the selection. Lines that are not duplicated inside the selection are not deleted, even if they have duplicate lines outside the selection.

Select “adjacent lines only” if you only want to delete a line’s duplicates if they’re immediately below the line they duplicate, without any other lines between them. If the file’s lines are sorted alphabetically, then the end result of “anywhere in the scope” and “adjacent lines only” is the same. In a sorted file, all duplicates are sorted together. Selecting “adjacent lines only”, however, deletes the duplicate lines significantly faster, certainly when the number of lines in the file is large. If you select “anywhere in the scope”, EditPad Pro has to compare each line with every other line in the file.

Comparison Options

By turning on one or more comparison options, you can tell EditPad Pro to consider lines as duplicates even when they aren’t identical.

The option “compare selected columns only” is only available when you’ve made a selection that does not span more than one line, or when you’ve made a rectangular selection. With this option, EditPad Pro only compares the selected columns. If the selection spans from column 10 to column 18, for example, EditPad Pro compares columns 10 through 18 of each line. If those 9 characters are the same the lines are considered to be duplicates. If a line has less than 10 characters then it is considered to be blank. This has important consequences (see next section).

“Ignore differences in leading spaces and tabs” treats lines that only differ in the number of spaces and tabs at the start of the line as duplicates. Similarly, “ignore trailing spaces and tabs” ignores differences in spaces and tabs at the end of each line. “Ignore all differences in spaces and tabs” is more than a combination of the two previous options. EditPad Pro then completely ignores all spaces and tabs, including spaces and tabs in the middle of lines.

“Ignore difference in case” compares lines without regard to the difference between uppercase and lowercase letters.

Lines to Delete

You have to select one or two choices in the “lines to delete” section. Every line in the file belongs to one of the 3 categories. Selecting none of the options would have no effect. Selecting all of them would delete all the lines in the file.

Turn on “2nd and following occurrences of duplicate lines” and turn off the other two options to delete all duplicate lines, leaving only unique lines in the file, regardless of whether they were previously unique. Use this to delete unnecessary duplicates from a file.

Turn on both “2nd...” and “1st occurrence of duplicate lines” to delete all duplicate lines, leaving only lines that were previously unique.

Turn on both “2nd...” and “non-duplicated lines” to leave only one copy of all lines that had duplicates. If you paste the contents of two lists that consist of unique lines (when viewed separately) into a file in EditPad, then you can use this combination to get the lines that occurred in both files, but not the lines that occurred in only one of the files.

If you want to keep only lines that occur a certain number of times, use the Delete Duplicate Lines several times. If you only want lines that occur 3 times or more, for example, use it twice with the “1st occurrence...” and “non-duplicated...” options turned on. Then use it again with the “2nd occurrence...” and “non-duplicated...” options. The first time you delete the lines that occur only once. The second time you delete lines that occurred only twice. The third time you delete the duplicates of lines that occur four times or more.

Blank Lines

Since blank lines are technically all duplicates of each other, EditPad Pro offers you an extra choice for blank lines. You can choose to either delete all blank lines, not to delete any blank lines, or to only delete duplicate blank lines. The “duplicate blank lines” option takes into account the “proximity” setting, deleting either all but the first blank lines (“anywhere in the scope”), or only replacing subsequent blank lines with a single blank line (“adjacent lines only”).

If you’ve turned on the “compare selected columns only” option, a line may be considered blank even when it isn’t. If a line is shorter than the leftmost column in the selection, it is considered to be blank, even if it does have text on it.

Lines with only spaces and tabs on them are only considered to be blank if you’ve turned on one of the options to ignore differences in spaces and tabs. On a line with only spaces and tabs, all spaces and tabs are considered to be both leading and trailing at the same time.

See Also

Extra menu
Extra|Delete Blank Lines
Extra|Consolidate Blank Lines