Friday, November 27, 2009

Week 12 Readings

I liked this weeks articles, and they were on a topic I'm interested in so that was even better. I particularly liked how it wasn't just a bunch of articles on tagging or folksonomy in general, rather articles on different aspects of folksonomy and what it can be used for (i.e. library instruction, academic library, etc.) I did think the articles were pretty basic though, almost bordering on to basic. The chosen articles could have used bit more depth but it was good that they were on things relating to libraries though.

1.) Allan, "Using a Wiki"

This article was about how libraries can use a wiki to make library instruction better for sharing information, facilitating collaboration in the creation of resources, and efficiently dividing up work loads among different librarians. A wiki is kind of like a word document where you can edit text and attach files.

This article focuses mainly on how wikis can be used to help in library instruction, whether from the library itself or in a particular class with the help of a professor. Library instruction wikis have to main uses, sharing of knowledge and ability and to cooperate in creating resources. Wikis are extremely easy to use and free to create. Once you create one you can invite other users to participate in it and they can change the wiki. Wikis are beginning to catch on in many different workplaces.

This was an interesting article, yet somewhat basic. I also think that there are many more useful way to use a wiki than in library instruction, yet it would be helpful in this also.

2.)Arch, "Creating the Academic Library Folksonomy"

This article is about social tagging and the advantages to it in libraries. Social tagging is a new phenomenon which allows people to create tags for websites and store them online. This could be very useful to libraries. It could help them better help their users in their research goals and needs. The article then gives some examples of sites that use tagging like delicious.

I had a fairly big problem with this article, especially since it was written in 2007 (fairly recently). The article makes it seem like the only thing tagging is good for is a glorified bookmarking system. It talks about how you can save and tag a website and then retrieve it later on a different computer, much like just bookmarking it to a server so it can be found on other computers. Social tagging compatibilities go way beyond this; this is only a small part of the advantages of being able to tag things. Tagging allows things to be stored and organized in ways the physical world was never able to. With tagging you can organize a book under a lot of different categories instead of it having to be on a specific spot on a shelf. The article touched on the advantages of tagging but was way to narrow and did not begin to show the scope of what is possible with tagging.

3.)Wales "How a ragtag band created Wikipedia"

This was an interesting video, much like the Google one we watched earlier in the semester, as it explained how Wikipedia was formed, its goals, and what it currently is working on. Wikipedia is a free encyclopedia written by thousands of volunteers. The goal of the organization is to provide access to as much information as possible to as many people as possible.

The one thing I liked most about the video is that it addressed the main issues/controversies/myths people have about Wikipedia. The main issue being that many people believe that because anybody can contribute and change a Wikipedia article that it is not reliable, especially because people can edit articles anonymously. The creator says that this is not as big of a problem as people think. The software Wikipedia uses is open ended where everything is left up to volunteers, because of this people police themselves and other users, instead of just creating false articles. Wikipedia maintains a neutral point of view, and again because many people may be working on the same article this is not as hard to do as people may think (even with political issues they can maintain a neutral point of view). The one thing I thought was very interesting is what the creator called the "Google Test" meaning that if the topic of an article does not show up in a google search then it is probably not worthwhile enough to have an encyclopedia article about.

Overall, I liked this video, especially because it gave me some background on a website that I use almost daily.

Week 11 Muddiest Point

When doing link analysis why don't they look at the outgoing links of a website also, instead of just incoming links. It seems like outgoing links would also be a helpful way to analyze a website, instead of just limiting it to incoming links.

Monday, November 23, 2009

Friday, November 20, 2009

Week 11 Readings

So it seems I have been way out of order in doing the blogs and readings. I remember Dr. He saying that the order of content was going to be switched, but when I went to do the readings I completely forgot. I did the readings by the date of them in courseweb and not the order I was supposed to. I hope this won't negatively effect my grade as I have done all of them the past 3 weeks just in the wrong order. It seems that I should be back on track for week 12 and it was only the last 3 weeks that were out of order.

This weeks readings were interesting and on a topic I enjoy. It was a little different since I am out of order and we have already discussed this topic in class, but I still thought the readings were good to read, even if they were late.

1.) Mischo, Digital Libraries: Challenges and Influential Work

E
ffective search and discovery over open and hidden digital resources is still problematic and challenging. There are differences between providing digital access to collections and actually providing digital library services. This is a very good point, and I liked it a lot. Just simply providing access to a lot of digital collections does not mean you are providing digital library services.

The first significant federal investment into digital library research was in 1994. There has recently been a surge in interest in metasearch or federated search by many different people and institutions. The majority of the rest of this article discussed previous and current research being done in digital libraries and the instiutions doing them.

2.)Paepcke, et al. Dewey Meets Turing

In 1994 the NSF launched the Digital Library Initiative (DLI), which united libraries and computer scientists together to work on the project. The invention and growth of the World Wide Web changed many of their initial ideas. The web very instantly blurred the distinction between the consumers and the producers of information.

One point in the article I found very interesting was the fact that the computer scientists didn't like all the restictions placed upon them by the publishers. They were not allowed to make public all of their work, becuase that would then make public all of the materials in it (i.e. the publishers copyrighted material). This is interesting because it showed the light of all the digital copyright laws to people that may not have understood all the restrictions.

3.) Lynch Institutional Repositories

Institutional repositories are a new strategy that allows universitites to accelerate changes in scholarship and scholarly communication. The author defines instituional repositories as a set of services a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. This includes preservation of materials when needed, organization, and access or distribution. He thinks a good IR should contain materials from both faculty and students and both research and training materials.

Universities are not doing a good job facilitating new forms of scholarly communication. Faculty are better at creating ideas, not being system administrators and dissemenators of their works. IR's could solve these problems. They address both questions of short term access and long term preservation, and have the advantage of being able to maintain data as well.

The author sees some problems where IR's can go astray or become counterproductive: if the IR is made for administration to exercise control over what had been a faculty controlled work; one can't overload the infrastructure with distracting and irrelevant policies; and don't implelment the IR's hasitly. Just because other universities are implementing new IR's doesn't mean you should rush into it and start one yourself. These are points the author believes universities need to look closely at when implementing institutional repositories.

IR's promote progress in infrastructure standards in many different ways, the author gives three examples. Preservable formats - the things in the IR should be preserved, different institutions will do this in different ways though. Identifiers - reference materials in IR's will be important in scholarly dialogue and record. Rights Documents and Management - management of rights for digital materials will be essential. You need a way to document rights and permissions of the works.

I liked the article on IR's the best. I'm interested in the many advantages of institutional repositories and how to best implement them and this article was extremely informative

Week 10 Muddiest Point

I understand that XML is a general markup language and can be used for many other things besides building webpages. Why is XML preferred for building webpages specifically though? It seems that HTML is much less complicated and hence less time consuming to use. XML is much more in depth and takes a lot more effort to produce a webpage. Maybe I'm just more familiar with writing in HTML as opposed to XML but it seems XML is more complicated and takes a lot more effort to produce a webpage.

Saturday, November 14, 2009

Week 10 Readings

These readings were on a topic that I'm not overly familiar with, so I learned a lot. I knew of the basics behind search engines, but not that many details on how they actually worked.

1.)Hawking - Web Search Engines

Part 1.
Search engines cannot and should not attempt to index every page on the world wide web. To be cost-effective they must reject low value automated content. Search engines require large scale replication to handle the large input. Currently search engines crawl/index approximately 400 Terabytes of data. Crawling at 10GB/s it would take 10 days to do a full crawl of this much information. Crawlers use "seeds" to begin their search, and then look through the link for URL's they currently haven't indexed.

Crawlers must address many issues. Speed - it would take too long for each crawler to individually crawl each website so they only crawl the URL's in which they are responsible for. Politeness - only one crawler goes to a URL at a time so it doesn't overload the sites servers. Exclude Content - they must respect the robot file that authorizes whether or not they can crawl the website. Duplicate Content - it is simple for a search engine to see if two websites have the same text but they must make distinctions between different URL's, dates, etc., they need more sophisticated measures to do this. Continous Crawling - How often a webiste is crawled/indexed is determined on numerous different factors, not just stactic measures. Spam Regulation - websites can artifically inflate their status by adding links to them so crawlers will see them better.

Part 2.
An indexer creates an inverted file in two phases - scanning and inversion. The internets vocabulary is very large - in contains documents in all languages as well as made up words such as acronyms, trademarks, proper names, etc. Search engines can use compression to reduce demands on disk space and memory. Link popularity score says that search engines assign this from the frequency of incoming links.

The most common query to a search engine is a small number of words without operators. By default search engines return only documents containing all the query words. A results quality can be improved if processors scan the end of lists, then sorts the long list according to a relevance scanning function. MSN Search reportedly takes into account over 300 ranking factors when sorthing their lists.

Search engines also have techniques for speeding up their searches and how fast their results are displayed. Search engines often skip hundreds or thousands of documetns to get to the required documents. They also sotp processing after scanning only a small fraction of the lists. Search engines cache their searches, which precomputes and stores pages for thousands of the most popular websites.

2.)Shreeves - OAI Protocol

The Open Archives Initiative Protocol for Metadata Harvesting has been widely adopted since its initial release in 2001. There are now over 300 active data providers. There mission is "to develop and promote interoperability standards that aim to facilitate the effect and dissemination of content." OAI is divided into repositories, that make the metadata availiable, and harvesters, that harvest the metadata. No one provider can service the entire needs of the public, so specific ones have popped up.

Many OAI registries suffer from a number of shortcomings, typically no search engines, limited browsing, and the fact that they are incomplete. The UIUC research group is trying to address these problems. They are trying to enhance the collection level description to enable better search functions. OAI also has some challenges they are currently dealing with: the metadata, problems with data provider implementations, and lack of communication between service and data providers.

3.) Bergman - Deep Web

This article as a tad bit to technical/scientifical for me, but was still an extremely good article. I didn't necessairly enjoy (or understand) all the technical data and scientific research information in this article, but I really liked the concept and theories behind it. The main point of this article was something I had never really explored or read about in depth. Overall, this article was one of my favorite reads of the semester, purely for the concepts it talked about.

The articles main concept is that most of the webst information is bured deep where search engines are unable to find it. The deep web is very different than the surface web. The deep web consists a lot of searchable databases, that are searched "one at a time," as opposed to searchable (search engines searchable) websites. BrightPlanet tried to quantify the deep web and their statistics actually blew me away. I knew that there was a lot of information that were "hidden" within in the web and Internet but I did not realize the extent of it. The deep web is 400 to 550 times larger than the surface web (which equates to about 7500 TB as opposed to 19 TB). The deep web has about 550 billion documents as opposed to the surface web's 1 billion. The deep web is also about 95% accessible to the public for free.

The internet is more diversified than realized, and the deep web is the major reason for that. The web is really only a portion of the Internet. Deep web covers a broad and releavant amount of topics. The information possesed in the deep web is of very high quality and is growing much faster than the surface web.

Week 9 Muddiest Point

I actually had no muddiest point this week.

Saturday, November 7, 2009

Assignment #5

Assignment #5

Koha Assignment - The books I chose are on something dealing with Pittsburgh

Wednesday, November 4, 2009

Week 9 Readings

I have absolutely no experience or knowledge of XML so these articles were very helpful to me, though sometimes a little too technical for me to understand. I seem to have a much better understanding of how to use XML, but am still unclear on a few things that will hopefully be talked about in class.

1.)Bryan, "Introducing the Extensible Markup Language (XML)"

XML is a subset of the Standard Generalized Markup Language (SGML), and is designed to make it easy to interchange documents over the Internet. With XML you must always clearly defined your start and end, as opposed to HTML where it is sometimes acceptable to not close your end tag. XML Document Type Definition (DTD) can be used to check to make sure that components of the XML document appear in a valid place.

XML is based on a concept of documents composed of a series of entities (things or objects); each entity can contain one or more logical elements, which can have attributes. One of the interesting things about XML is the way it incorporates special characters into the code, I liked how it did this.

An XML file has three types of markups - the first two are optional. First, the XML processing instruction, which identifies the version of XML being used. Second, the document type declaration. Lastly, the fully tagged document instance. If all three are present then the document is considered "valid." If only the last one is present then the document is "well-informed." XML is ideal for using in databases.

2.) Uche, "A Survey of XML Standards"

I liked how this article was setup and that it provided you with a brief overview of many diffrent parts of XML and what they can do - not just technical information on how to use/write XML

XML has been widely translated to different languages, but English is still the standard. There was some controversy when XML 1.1 came out. This new version had only very small changes to it and people wondered whether a new version was really necessary, especially because there was a good chance interoperability issues would arise.

XML catalogs define the format for instructions on how the processor resolves the entities into actual documents. XML namespaces proved a mechanism for naming elements and attributes. Xinclude is still being developed, but it will provide a system to merge different XML documents. This is usually used to split large documents into manegable chunks, then merged back together again. Xpath can be used to locate certain elements in a document. Xlink is a generic framework for expressing link sin XML.

3.) Bergholz, "Extending Your Markup"

I really liked the figures in this article, they provided good examples of XML. The examples were complex enough that you got a decent idea of how to write XML, but not so complicated that you didn't understand what they were trying to portray.

XML is all about meaningful annotations. DTD's define the structure of XML documents; they specify a set of tags, order of tags, and attributes associated with tags. XML elements can be either terminal or nonterminal. Nonterminal elements contain subelements which can be grouped as sequences or choices. XML attributes use the !ATTLIST tag.

XML extensions include namespaces and addressing and linking abilities. Unlike HTML it is not necessary to use an anchor in XML, and extended links can connect multiple documents together. Namespaces avoid name clashes. Extensible Stylesheet Language (XSL) can allow you to transform XML into HTML. XML schema allows the user to define datatype.

4. W3 Schools Tutorial - XML Schema

Much like the HTML tutorials the XML tutorial was very helpful; I really like the W3 school tutorials, and their website. I wish this website would have given more technical information on how to actually write XML schemas (like it did with HTML) as opposed to more theory, or what XML schema can do.

XML schema describes the structure of an XML document and can be used as an alternative to DTD's, as they are much more powerful. XML schema defines elements, attributes, order of elements, and many other things. It supports datatypes and is written in XML, which has many advantages to it. A simple element contains only text, and cannot have attributes. If a element has an attribute it is considered complex. Restrictions can be used to define acceptable values for elements and attributes.

A complex element contains other elements and/or attributes. There are four kind of complex elements. Empty complex elements - they can't have contents, only attributes. Elements only - contains an element that contains other elements. Text only - can contain text and attributes. Mixed - can contain attributes, elements, and text. Indicators control how elements are to be used. String data types are used for values that contain strings.

After reading the articles I have a much better understanding of XML. One thing I'm still confused about though are the advantages of XML over other markup languages, specifically HTML. A lot of the articles stated that XML was better and gave theoretical reasons why it was better. From looking at the many examples of things being written in XML it looks enourmously more complicated than HTML though. Something can be written in HTML in a few lines looks like it takes a ton of lines in XML. I know XML is suppossed to be better than HTML it just looks way more complicated and time consuming.

Week 8 Muddiest Point

I understand how to make a cascading style sheet and the purpose of it. How do you apply the style sheet to the document you are writing in HTML? I know with an external style sheet you provide a link for it when you write your HTML document, but if you have what you have written in HTML and what your cascading style sheet - how do you "combine" the two?