In the last couple of weeks I blogged about keywords as they were displayed to the users of the OPAC in the library of the Peace Palace. I showed a couple of maps built with Gephi, some exhaustive, others very detailed.
But, how did I collect and adapt the data to be used by Gephi? I already mentioned "exposed keywords to the user" in an earlier blog. So to start with; what is the meaning of "exposed keywords"? I mean with this "keywords such as they occur in the presentation of the titles which were actually seen, perhaps even read, by the user". I' am interested in these keywords.
The enumeration of keywords in just one title can indeed be considered as a very small network. All these keywords are somehow linked to one another. Therefore, the first step is to gather all the presented titles and the second step is to collect all these small networks of keywords and then, lastly, to create one huge file which can be used by Gephi.
In the table below I give some examples of the file structure. In the left column you see five keywords (for Gephi they are nodes), each with a count of one, called 'use'. Underneath that you see the unique combinations of the keywords (for Gephi they are edges), also with a count, called 'weight'. The number after capital P is a unique keyword identifier. Gephi likes doing arithmetic with simple codes instead of -sometimes- long strings with weird characters in it. In the middle column you see the same, except now the keywords are from another title. In the third column both sets of keywords are combined. Take notice of the keyword 'Women', it occurs in both titles, therefore in the third column the 'use' is raised to two. At the bottom of each column you see the corresponding Gephi map.
Of course I do not create these lengthy (15.000 lines and more) files by hand. I wrote a couple of crude PHP scripts to generate a crude file. This file I clean up with R and Microsoft Excel and the resulting file is ready to be used by Gephi. The scripts use a MongoDB collection, which contains all the logging of OPAC use in our readingroom. It is possible to detect 'exposed titles' (so also the keywords therein) in this logging.
To conclude. This is all very technical stuff and we may not expect our users to do this kind of research themselves, based on rough data provided by the library. However, some library staff members should certainly be able to do this. And then communicate about the results, using interesting maps for instance. Communicate to management about library collection issues, communicate to users about trends, communicate about almost lost niches in the collection, communicate about actual, important subcollections which can be used in updating dossiers, research guides, alerting systems, etc.
My other blogs about 'Gephi in libraries':
donderdag 27 november 2014
donderdag 20 november 2014
Just below the surface.
Using Gephi("an interactive visualization and exploration platform for all kinds of networks") to create unprocessed maps of exposed keywords to the user in the library of the Peace Palace, will result in an image in which a few huge subjects will dominate. These subjects are indicators of the core business of the library: Human Rights, European Union, United States of America, International Law and International Criminal Law to name just a few.To the left you see a very reduced image of such a map, but a few main keywords are still discernible. |
Et voilĂ , after using the option 'rank parameter' in Gephi and choosing for betweenness centrality a new overall map appears, now with new highlighted nodes or keywords. Before zooming in, I will try to explain what betweenness centrality is. In brief, betweenness centrality is an indicator value for a key position. The higher the value the more important the role of the keyword. This value is calculated by counting the shortest paths between two keywords in our network. The keyword which appears the most times as being in between two different keywords, has the highest betweenness centrality value; these keywords are brokers or intermediaries. I used these values to create the map at the left. |
Those of you who would like to have the data file used in Gephi to create all the maps shown, do contact me at a.janson at ppl dot nl.
My other blogs about 'Gephi in libraries':
donderdag 13 november 2014
Keywords! Maps! Let's dive in.
Last week I blogged about maps and keywords: Library and user: one interest? I presented a few maps, created with Gephi, with which I tried to compare the activities of the library staff with the interests of OPAC users. I talked about general subjects like 'international criminal law', 'space debris' and things like that.
These maps can also be used to get a detailed picture, although I admit the presented maps are a bit difficult to read after zooming in. However, librarians can use Gephi itself to do detailed research in order to find out what our patrons are looking for.
See for example this image, clipped from the Gephi overview graph frame, which shows keywords all about art, trade and illegal activities in just a tiny section of the map. I think librarians can use such insights to better facilitate their users, especially if they detect returning patterns in searches during a longer period. If librarians can 'translate' these insights in more relevant acquisitions, improvements in their research guides (in this case the Peace Palace Library, Cultural Heritage) or write specific blogs or tweets, I'am sure interested visitors will return to the library. |
Of course users can manipulate the map with OPAC searches and focus on just one group (to the left you see the International Criminal Law group), but even one large group can be quite intimidating. Nevertheless those users who take some time can obtain a thorough knowledge about keywords grouped around one or two core subjects of the Peace Palace Library. Just start selecting a group using 'Group Selector' then click the largest bubble and check all the other keywords in the 'Information Pane'. |
Labels:
gephi,
keywords,
library,
Peace Palace Library
Locatie:
South Holland, The Netherlands
donderdag 6 november 2014
Library and user: one interest?
Quote: "But also interesting is, to see whether the library staff takes the interests of the patrons into account while acquiring documents for their collection? That is a subject for another blog."
Here I am referring to an earlier blogpost in which I tried to show what our users are looking for in the OPAC of the Peace Palace Library. In order to make this happen I focused on the use of our link resolver and presentation of a general subject in this link resolver. I used Tableau to create some graphs.
However, the same thing can be done on the basis of the title descriptions which appeared on the screens in the reading room of the library after a succesful search. So, I collected all these titles and used all the keywords added to these titles to create a map using Gephi. In yet another blog I reported about this, although over there I used the recent acquisitions of the month of September.
In order to gain insight to answer the question "whether the library staff takes the interests of the patrons into account while acquiring documents for their collection?" I created two maps for comparison. One about the acquisitions in October and the other about the use of the OPAC in the readingroom in the same month.
If I enumerate the main subject topics which can be identified on indicated webpages, we get the following lists:
So our user behavior indicates special interest in Space, Environment, Commerce -among other things- which were not covered by our library staff. However, the library acquired material about Immigration, Islamic Law and Trade which was not looked for by our OPAC users. But of great importance is still the observation that both parties share their interest in the core business of the library of the Peace Palace: Criminal law, Human rights, European Union.
Only with regard to the peripheral areas differences exist and for a large part that can be related to current events, like boat refugees in the Mediterranean Sea, terrorism in the Middle East, space debris and environmental issues.
Anyway, the simple fact that the 'small subjects' are also found and acquired, means that the library of the Peace Palace is on the right track. The 'small subjects' looked for now, were added in the past!
Here I am referring to an earlier blogpost in which I tried to show what our users are looking for in the OPAC of the Peace Palace Library. In order to make this happen I focused on the use of our link resolver and presentation of a general subject in this link resolver. I used Tableau to create some graphs.
However, the same thing can be done on the basis of the title descriptions which appeared on the screens in the reading room of the library after a succesful search. So, I collected all these titles and used all the keywords added to these titles to create a map using Gephi. In yet another blog I reported about this, although over there I used the recent acquisitions of the month of September.
In order to gain insight to answer the question "whether the library staff takes the interests of the patrons into account while acquiring documents for their collection?" I created two maps for comparison. One about the acquisitions in October and the other about the use of the OPAC in the readingroom in the same month.
Acquisitions | OPAC |
If I enumerate the main subject topics which can be identified on indicated webpages, we get the following lists:
Acq:
| OPAC:
|
So our user behavior indicates special interest in Space, Environment, Commerce -among other things- which were not covered by our library staff. However, the library acquired material about Immigration, Islamic Law and Trade which was not looked for by our OPAC users. But of great importance is still the observation that both parties share their interest in the core business of the library of the Peace Palace: Criminal law, Human rights, European Union.
Only with regard to the peripheral areas differences exist and for a large part that can be related to current events, like boat refugees in the Mediterranean Sea, terrorism in the Middle East, space debris and environmental issues.
Anyway, the simple fact that the 'small subjects' are also found and acquired, means that the library of the Peace Palace is on the right track. The 'small subjects' looked for now, were added in the past!
Abonneren op:
Posts (Atom)