UniformResourceIdentifier

MeatballWiki | RecentChanges | Random Page | Indices | Categories

According to http://www.w3.org/Addressing, a UniformResourceIdentifier (URI) is

: The generic set of all names/addresses that are short strings that refer to resources.

The set of URIs is a superset of UniformResourceLocators and UniformResourceNames.

For more on URI syntax and semantics, see RFC 2396. From which we have first the grammar [Appendix A]:

      URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
      absoluteURI   = scheme ":" ( hier_part | opaque_part )
      relativeURI   = ( net_path | abs_path | rel_path ) [ "?" query ]
      hier_part     = ( net_path | abs_path ) [ "?" query ]
      opaque_part   = uric_no_slash *uric
      uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" |
                      "&" | "=" | "+" | "$" | ","
      net_path      = "//" authority [ abs_path ]
      abs_path      = "/"  path_segments
      rel_path      = rel_segment [ abs_path ]
      rel_segment   = 1*( unreserved | escaped |
                          ";" | "@" | "&" | "=" | "+" | "$" | "," )
      scheme        = alpha *( alpha | digit | "+" | "-" | "." )
      authority     = server | reg_name
      reg_name      = 1*( unreserved | escaped | "$" | "," |
                          ";" | ":" | "@" | "&" | "=" | "+" )
      server        = [ [ userinfo "@" ] hostport ]
      userinfo      = *( unreserved | escaped |
                         ";" | ":" | "&" | "=" | "+" | "$" | "," )
      hostport      = host [ ":" port ]
      host          = hostname | IPv4address
      hostname      = *( domainlabel "." ) toplabel [ "." ]
      domainlabel   = alphanum | alphanum *( alphanum | "-" ) alphanum
      toplabel      = alpha | alpha *( alphanum | "-" ) alphanum
      IPv4address   = 1*digit "." 1*digit "." 1*digit "." 1*digit
      port          = *digit      
      path          = [ abs_path | opaque_part ]
      path_segments = segment *( "/" segment )
      segment       = *pchar *( ";" param )      
      param         = *pchar
      pchar         = unreserved | escaped |
                      ":" | "@" | "&" | "=" | "+" | "$" | ","
      query         = *uric      
      fragment      = *uric
      uric          = reserved | unreserved | escaped
      reserved      = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
                      "$" | ","      
      unreserved    = alphanum | mark
      mark          = "-" | "_" | "." | "!" | "~" | "*" | "'" |
                      "(" | ")"      
      escaped       = "%" hex hex
      hex           = digit | "A" | "B" | "C" | "D" | "E" | "F" |
                              "a" | "b" | "c" | "d" | "e" | "f"
      alphanum      = alpha | digit      
      alpha         = lowalpha | upalpha
      lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
                 "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
                 "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
      upalpha  = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
                 "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
                 "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
      digit    = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
                 "8" | "9"

and a regular expression for parsing a URI [Appendix B]:

   The following line is the regular expression for breaking-down a URI
   reference into its components.

      ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
       12            3  4          5       6  7        8 9

   The numbers in the second line above are only to assist readability;
   they indicate the reference points for each subexpression (i.e., each
   paired parenthesis).  We refer to the value matched for subexpression
   <n> as $<n>.  For example, matching the above expression to

      http://www.ics.uci.edu/pub/ietf/uri/#Related

   results in the following subexpression matches:

      $1 = http:
      $2 = http      
      $3 = //www.ics.uci.edu
      $4 = www.ics.uci.edu
      $5 = /pub/ietf/uri/
      $6 = <undefined>
      $7 = <undefined>
      $8 = #Related
      $9 = Related

   where <undefined> indicates that the component is not present, as is
   the case for the query component in the above example.

The fun stupidity of the URI spec:

http://minorest.minor.major/major/minor/minorest

The least significant domain segment comes first, but the least significant path segment comes last.

ftp://com.example.ftp/pub/incoming would have been better than ftp://ftp.example.com/pub/incoming. -- SunirShah

: Maybe I should add a UserPreference? to display URL links the "right" way, and automatically translate URLs when edited by such users? --CliffordAdams (This feature is scheduled slightly after the "World Peace" module. :-)

: One reasonable explanation for the order of DNS names is that "optional" data should be added at the end of a string. Before DNS, many tools used a simple single-word hostname. Many tools had to accept both local hosts and new DNS names. It's easier to add data to the end of a string, especially in languages like C. Just be glad X.400 didn't really happen. --CliffordAdams (who had a "...!inhp4!" email address and remembers playing with the "newfangled DNS")

Outlook Server uses X.400 / X.500. Go figure. -- JürgenHermann

CategoryWebTechnology

UniformResourceIdentifier

Discussion