Get Clues from URLs

URL stands for Uniform Resource Locator.
URLs are basically Internet addresses - the location at which you can find a particular resource.
They look something like this:

http://useful.clue.net/

You will see them at the top of your Web browser (in the Location box if you use Netscape or the Address box if you use Internet Explorer).

 
Use the URL as a clue - it can provide a lot of information about a resource and your location within it. 

URLs:

A lot can be deduced from a URL before looking at the actual site itself: for example, look at the information that can be gleaned from the following URL:

http://www.bps.org.uk/publicat/Periodicals/Psych/PSY9_97.HTM

 URL image
 
So just by looking at this URL you can deduce that: The conclusion?
  1. You will find a UK Web site belonging to an organisation.
  2. You will find a file, within a directory, within a directory, within a directory ie. you'll be taken to a page deep within a collection of related resources.

Dissecting URLs

The basic structure of a URL is:

protocol://server-name.domain-name/directory/filename

1)  Protocol

The first part of a URL - before the colon - describes the access method (protocol).

Data can be made available on the Internet via a number of different protocols:
 
http:// a World Wide Web server (WWW)
ftp:// File Transfer Protocol
mailto:  email
telnet://  Telnet 
gopher:// gopher
 

2)  Machine Address

The second part of a URL - after the protocol and // - tells you about the machine that you are accessing.

This can offer useful clues, since this part of the address sometimes tells you the country in which the machine is located and the nature of the organisation that owns it.

Server names

People can name their machines whatever they like - and if they are using the WWW this will be called the server name.

Most organisations use their name within their server name, for example:

http://www.harvard.edu/

is the Internet address of the Harvard University Web server.
 

Domain names

The domain name identifies the position of the resource on the Internet. People have to formally apply for a domain name (to their Internet provider or an Internet company) so that no two machines can have the same address on the Internet. Domain names can offer you useful clues since it can include country codes and organisational codes.

For example:

.ac.uk

... indicates that the resource is held on an academic server (.ac) in the United Kingdom (.uk).

Cracking Country Codes

You can sometimes get a clue about the country the server is based in from the country code.  For example:
 
au Australia
ca Canada
de Germany
fr France
uk United kingdom

Note however, that a country code will not always be included in a URL. Many American sites for example, will not have the country code (.us) in their URL.

Cracking Organisational Codes

You can get clues about the nature of the organisation that owns the server from the organisation code.  For example:
 
ac, edu academic or educational servers
co, com commercial servers
gov government servers
org non-governmental, non-profit making organisations

Note that different countries can have different codes for the same type of organisation. For example, a university server may have a .ac code in the UK (ac is short for "academic") but a .edu code in the USA (edu is short for "educational").

A list of country and organisation codes is available in the Appendices.
 
 

Warning!

The domain and server names may not always be straightforward clues about the location and source of the information.  

People can call their servers any name they wish and it is possible for them to register them with domain names that give false impressions. 

For example, it is possible (though perhaps unlikely) that the URL: 

http://MacDonalds.com

does not point to the site of a hamburger outlet but to  "Old MacDonald's Farm Supplies" ! 
 
 

3)  Directories and filenames

After the machine name, between the next set of slashes (/) you will see the names of directories containing the file you are accessing.

Many Internet resources are organised into directory structures similar to those found in other computer applications.  These can provide useful clues about the structure of the site.

For example:

http://www.bps.org.uk/publicat/Periodicals/Psych/PSY9_97.HTM

has a fairly complex directory structure - three directories are given (publicat, Periodicals and Psych) before you see the name of the actual file (always at the end on the right hand side of the URL).

This is a clue as to the size and complexity of the site - generally speaking, the more directories, the more complex the site. Bear in mind that a complex site is not necessarily a high quality site!

This is also a clue that this URL would take you to a file deep within the site.

Being speculative, this URL probably takes you to a 1997 issue of a periodical on a subject from the field of psychology.
 


Practical Hints and Tips

Deleting parts of the URL to learn more about the site

It can be very useful to delete part of the right hand side of the URL to see where the new, shorter URL takes you.

By doing this you can get clues as to your location within the site and the structure of the site.

By deleting URLs from the right hand side to the single slash marks (/) you will move up the directory tree and see how the file is embedded in the site.

For example, look what happens if you delete part of the following URL:
 
URL Points to: tells you:
http://www.ariadne.ac.uk/issue13/music/  an online article this is an online article
http://www.ariadne.ac.uk/issue13/  the contents page of issue 13 of a journal the article is in issue 13 of this journal
http://www.ariadne.ac.uk/  the home page of an e-journal the article is contained in this journal

You can delete part of the URL by putting your cursor at the end of the URL in the "location box" and pressing the "back" or "delete" key until you reach the slash (/), then press the "Return" key.

Delete from the right, up to the slashes in the URL.

This technique can be especially useful for long URLs.

Finding the Home Page of a Web site

A home page is the front page of a Web site - the equivalent of the cover of a book - and as such can offer useful information such as the title, author and a summary of what the site is about. Hyperlinks on the WWW often drop you right in the middle of a site as opposed to at the home page. This can make it difficult to work out where you are. It is good practice to look at the home page of a site before you use it, just to ascertain exactly what it is you are looking at. URLs can help with this - the root of a URL will often take you to the home page. It won't always work - but try deleting the file names and directory names on the right of the URL and then hit the return key (make sure your new URL ends with either / or html or htm). This may take you to the home page of the site. For example:

The British Monarchy Web site
http://www.royal.gov.uk/family/diana.htm A page deep within the site
http://www.royal.gov.uk/ The home page

This can be especially useful when you are looking at search engine results, which often take you deep within Web sites rather than to the home pages.

Tip!

URLs ending in:
/welcome.html
/index.html
/default.html

are often home pages
 
 

The tilde ~ sign

In some URLs you will see the tilde sign  which looks like this:  ~

For Example: 

http://www.ilrt.bris.ac.uk/~cmpac/

Use the tilde as a clue!

Most Web servers use the ~ symbol to represent the personal directories of individuals.

If the URL contains a tilde then be aware that you are probably (although not definitely) looking at a personal page with personal opinions rather than an official site giving the official line.

However, this does not mean that the information is necessarily of poor quality.

For example the following Web page has a tilde in the URL:

http://www.ilrt.bris.ac.uk/~cmpac/

The page is located on a University of Bristol server, but is NOT an official page of the University - it is the personal page of a member of staff.

home page image
 

PURLs

Some URLs will have the word "PURL" located in the early part of the URL.

PURL stands for Persistent Uniform Resource Locator.   For example:

http://purl.org/metadata/dublin_core

A PURL is a clue that the owner of the resource is committed to keeping the site stable and persistently available via a given URL.

To obtain a PURL the owner has had to register the site with an intermediary PURL service.  If for any reason the site moves addresses the owner registers the change of address with the PURL service which then redirects any users to the new URL.

A PURL address should not lead you to a dead link and should mean that the same URL will always point to the same resource even if, behind the scenes, the resource has been moved from server to server.


Home | Contents | Quality | Content Criteria | Form Criteria | Process Criteria | Examples | Try it Out
8 of 52 | Previous | Next