Removing a non-repeating list of records in a list with repeated records in b ..

Home » Computers and Technology » Removing a non-repeating list of records in a list with repeated records in b ..
Computers and Technology No Comments

* Summary

* Problem Statement

* Aspects to consider

* An application subject to the procedure

* References Cited


There are often repeated records in Excel columns and is often required to know how many and which are different records in a particular column. This paper presents a method that solves this problem. The procedure is explained with an example in four easy steps and combines the application of the statistical function CONTAR.SI, copying formulas with absolute cell references relative and Filter tool from the Data menu. The procedure is quite useful because there is no tool to do this in Excel directly. An application example is in collections databases of organisms.

Keywords: Excel, Excel Formulas, Filter, CONTAR.SI, Databases, Applications. Biological collections.

Problem Statement

Often using the Excel spreadsheet to store information. Also, Excel has some tools to manipulate and analyze relatively quickly the information. To cite some of these elements can be named as Sort from the Data menu, Filter, Text to columns, validation, etc., And are also very useful, Statistical functions, Text, Search and Reference, Information, logic, etc.

Sometimes, the direct use of a single tool (either a menu item or some function data) throws us exactly the result we want. However, in other cases it is necessary to combine several of these tools to get the result you really want. Suppose we have a database of locations within a country, such as Colombia (Figure 1). Each row corresponds to a different location, and each column is a different type of information: Department, County, City, Latitude and Longitude. Now, suppose you want to get the fastest possible all locations del Cauca Department. This is easily done in a straightforward manner and with the Filter option from the Data menu (see eg, Rivera-Gutierrez, 2004). Note in the database as Figure 1 that there are several locations in the same municipality, then the names of municipalities and departments are repeated more than once in their respective columns. Given this, suppose now we have asked for a list of all the different departments that exist in our database. A list with all departments contents but not repeated.

Figure 1. Database in Excel with locations in different departments of Colombia.

This situation repeated records in one or more columns of the databases is prevalent, and often we want to know how many and which different items (without repeating) is in a particular column. This requirement does not fulfill any tools in Excel directly. And unless you want to do it manually, which can be very tedious if our database has many rows, you can use the procedure described in this article. This procedure basically combines two tools: CONTAR.SI Statistical Function and Filter Data menu option. Therefore, for their understanding and application must know enter and copy formulas in the cells of the spreadsheet and differentiate between absolute and relative references to cells, and know how to use the Filter option from the Data menu. If you have no knowledge on these topics, please consult the help of Excel. On the Internet you can also get other sites with good Excel courses (eg Accessed December 30, 2004, http://www . / accessed December 30, 2004). Garcia de Jalon, Fernndez-Caballero & Garcia-Martos (2001) is another full course. Rivera-Gutierrez (2004) illustrates applications Filter tool (Data menu) and other functions.

The images of the figures in this article are in Excel with Windows XP and some have been deleted status bars and formulas for more coverage of the image.


The procedure will be illustrated with the example of Figure 1 and the problem posed in the introduction. It requires a list with each of the departments that have the database, but without any recurrence. The procedure is to have top-down (or bottom-up, no matter the order) the number of times to repeat each record (required column) and counted by excluding records. After using the Filter (Data menu) to filter records with a single repetition. The resulting list of said filter is desired. The procedure can be divided into four steps:

1) is chosen to enter an empty column CONTAR.SI formula. In our example, this column will be the G and we will put the title “Count” (Note that all columns in the database have a title). The function has a syntax CONTAR.SI with two types of information: a range of cells and a criterion for counting. This function counts the cells in the range that meet the criteria given. In the case of our example, the range is B2: B34 that corresponds to the column of the departments and the criterion is the information contained in each cell of the same rank, in the case of formula entered in cell G2, the criterion is cell B2 (Figure 2). As you may know, the formulas can be entered either with the help of the same Excel or manually from the keyboard. The first is recommended for low-skilled, any formula is entered from the Insert menu (toolbar) and choosing function.

Returning to Figure 2, after entering CONTAR.SI function with its range and criteria, and typing ENTER, throw resulting in cell G2, the number of times to repeat the word “Amazon” in the range specified cell (B2: B34). This is 5 (Figure 3).

Figure 2. Entering Formula CONTAR.SI, and allocation range.

To view the graph select the “Download” top menu

Figure 3. CONTAR.SI function result to the first cell.

2) Now you want to copy the formula down but soon entered should be aware that when you copy a formula cell references may change depending on the cell reference is used. If the cell references are absolute, to copy the formula to other cells, no change in the cells that the formula is referring. But if you use relative references, if they change.

In our case, we must use a reference type that gets copied formula in each cell of their count excludes cells that are above it and include both the same cell as the cells below it. Thus, the content of each cell will be counted only once from top to bottom. This is achieved by adding a dollar sign ($) between B and 34 in the syntax of the formula in cell G2, as follows:

= (B2: B $ 34, B2)

To write the dollar sign ($) after entering the formula, we stopped at the desired cell, click on the formula bar and type the $ sign at the required location. (Figure 4).

The $ sign B34 cell remains fixed as you copy the formula down, likewise, this sign is omitted deliberately position range B2 CONTAR.SI to function to copy the formula in cell G2 downwards , the first cell in the range of the function will also move down one cell.

Figure 4. Add the $ sign to the formula already entered.

To view the graph select the “Download” top menu

3) After added the sign ($) to the formula in cell G2, copy the contents of this cell down to cell G34 (Figure 5). As a result, each cell in the range G2: G34, you get the number of times to repeat each record in column B for each row in the range B2: B34, excluding cell count (or records) background (Figure 6) .

To verify that the function and copying CONTAR.SI have done their job, we are in a cell in the range G2: G34 than the first, for example the G10. Copied function in this cell must contain the following range:

= (B10: B $ 34, B10)

ie is counting how many times is the word “Antioquia” in the range B10: B34, which is correct for our interests (Figure 7). The reader can verify that this is true for any cell. For example, if we were located in cell G30, the range of the formula in that cell will be:

= (B30: B $ 34, B30)

Figure 5. After the formula is typed the $ sign in the appropriate position, the formula is copied from cell to cell G34 G2.

To view the graph select the “Download” top menu

Figure 6. Final result after CONTAR.SI copy the formula in cell to cell G34 G2.

To view the graph select the “Download” top menu

Figure 7. Demonstration that the copied formula has operated as expected.

To view the graph select the “Download” top menu

4) The next step is to use the Filter (Data menu). When enabled, this option (Figure 8), a tab appears in the front row of all columns that have information. If we stand in cell G1 (containing the title Count) and deploy the tab, we can immediately see a list of all the numbers in column G, then select the number 1 (Figure 9), and with this we will filtrate our database for that number. The resulting list (Figure 10) shows in column B without repeating all departments. This list can copy and paste it to another sheet (may be the same book or another book) and so we have obtained the desired result.

Figure 8. Activating the filter option (Data menu).

Figure 9. Filtering by Column G, with the number 1.

To view the graph select the “Download” top menu

Figure 10. Select a list resulting from column G, number 1. This list contains all the departments (column B) without repeating.

To view the graph select the “Download” top menu

Aspects to consider

1) This procedure also applies if you start from the last cell to the first (ie bottom-up). For this, in the case of the example given in step 1 we are located in cell G34 and enter the following formula:

= (B $ 2: B34, B34)

Now, the fixed cell (with a $) is B2, so as to copy the formula above, the cell will move end of the range and not the former. The result is different, in fact opposite to that at the beginning, but the filter by the number 1, you get the same list.

2) The result is the same if we add a $ sign before the letter B, ie, if you copy from the top down:

= (B2: $ B $ 34, B2)

or in case of copying from the bottom up:

= ($ B $ 2: B34, B34)

The $ sign before the letter B sets the range and protects copied horizontal movements, but as the movement that is copied is vertical, the $ sign before the letter B is not necessary.

3) It is not necessary that the text in each cell of the column you want to display this in some order. The procedure works well if the information is in total disarray.

4) The final list generated by the process described does not occur organized in alphabetical order (if it is text) unless the original column organized in this order. It is important to note that this final list is filtered so if you want to see in some order can not simply select and activate the Sort tool (Data menu) because this tool does not work when the filter is active. To sort the list, must be selected, copied and pasted into another worksheet.

5) The procedure described here may also provide a review of misspellings of text contained in the column that you want to examine. For example if the record Amazon is repeated 5 times, but twice, the word is spelled differently (eg “Amasonas” or “Amazon”), tell CONTAR.SI function “Amasonas” and “Amazon” as records distinct and appear in the final list. CONTAR.SI not discriminate function is case sensitive, so “Amazon” and “Amazon” will be counted as the same. CONTAR.SI function ignores blank cells.

An application for the issue of procedure.

The design of this procedure was motivated by work with a database of a collection of copies of approximately 13,000 records agencies. In this work we were asked to collect and analyze certain information from the database. For example, how many different species there are, there are many different locations, etc. So the procedure illustrated here was very useful for this purpose. With the list of names without repeating, you can get another set of data quickly with features like CONTAR.SI, SEARCH, etc.. Below is attached the database in Figure 1 for the reader’s copy in Excel and practice the procedure outlined.

References cited

Garcia de Jalon, J., Fernndez-Caballero, D. & Garcia-Martos, C. (2001). Learn Excel 2000

like I was at first. School of Industrial Engineering, University of Madrid. Madrid. Retrieved December 30, 2004 to:

Rivera-Gutierrez, H. F. (2004). Roadmap to integrate reference data sets for

validation of information using Microsoft Excel. Research Institute for Bioresources Alexander von Humboldt, Bogot, Colombia. Retrieved December 30, 2004 from

Author Details

Pablo Andrs Gonzlez Guzmn

Studies: Biology. Universidad del Valle, Cali, Colombia.

Professional experience: Managing Databases Biological Collections.

Category for this job: Computer.

Other Suggested Category: Bioinformatics.