In Web Technologies
0
117

In this article, we will see the basic understanding of the HTTP; how does work its headers?
Also, the most important thing, we will see that how to request HTTP with C#. For web-crawler, web-scrapping developer the toughest part is to get the content from the web application.

After going through this article, we will be able to get the content from any URL by using different methods i.e GET and POST and different types of headers. Let’s proceed ahead…

Content at a glance:

  • What is HTTP?
  • How to analyze the HTTP request?
  • How to HTTP request with C#?
  • What are headers and its type?
  • Conclusion

Now, I am going to explain each and every topic in detail.

What is HTTP?

HTTP stands for Hyper Text Transfer Protocol. This is the foundation of data communication for the world wide web. A protocol defines a set of rules that enable effective communications between computers. Also, it is part of a protocol framework called the “Internet Protocol Suite” which includes TCP/IP. HTTP defines how messages are transmitted between visitor’s browser and website’s server, where messages can be in the form of text, images, video, graphic, sound and other multimedia files.

What are headers and its type?

HTTP headers are basically used to request on the server as well as get the response from the server. This can be different types for a different request. Using headers with HTTP, we get data from the server in different forms like in the form of text, images, graphics, sound, video and other multimedia files.

How to analyze the HTTP request?

The biggest problem with web-crawler, web-scraper developer is that how to analyze the request URL and how to see the headers which are passing through the request URL. So don’t be bothered…here is the solution!!!

Afterall, you will be able to understand that how to start to see the headers from Browser. For any request from you want to get the content you must be aware of the HTTP headers which are requesting with URL. So follow the step to see the headers.

For Example, we want to analyze the URL “https://www.amazon.in/gp/product/B00UG4IMHS/ref=ox_sc_act_image_1?ie=UTF8&psc=1&smid=AT95IG9ONZD7S”

1. First, open the Chrome browser.

2. Then, go to the extreme right side on URL bar and click on that.

3. Then, follow More tools -> Develop tools.

You will see below image:

4. Click on Developer tools.

You will see below image:

Note: This is blank because we did not request any URL.

5. Paste URL into the URL bar you will see the result as an image below.

6. Now, click on the top of the row which is given in network section and see the result.

7. For seeing the content, click on the Response tab which appears at the bottom right side window. These are the required contents. See the image below:

One thing will be going on your mind. What are the request headers which are used to this URL? See below:

Request header

Response header

I am sure, now you will be able to find the header from chrome browser.

Also, one thing left that how to request the URL and get the content using C# code. Wanna know it all? Read more to uncover the mystery…

How to HTTP request with C#?

Now, the question arises how to get the HTML or JSON content from any URL? Well, it’s quite simple!

HTTP request on the server using GET or POST method. It can be one of them, GET or POST (Although it depends on the requesting method). Hence, you can see at the time of analysis from the chrome browser in the developer tools section.

I am going to show the code one by one.

HTTP request with GET method using C#

First of all, simply you can use a given method to get the content using the GET method.

public string GetHtmlSource(string URL)
{
ReAttempt:
try
{

}

catch (Exception e)
{
goto ReAttempt;
return string.Empty;
}
}

Also, sometimes you may get the error related to network security concern. To resolve this issue, add the following things:

In the method add,

ServicePointManager.ServerCertificateValidationCallback += new RemoteCertificateValidationCallback(AllwaysGoodCertificate);

ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls12 | SecurityProtocolType.Ssl3;

In addition, add another below method:

private static bool AllwaysGoodCertificate(object sender, X509Certificate certificate, X509Chain chain, SslPolicyErrors policyErrors) {
return true;}

Now, you will see the issue related to the namespace. To resolve this issue, you have to add the namespace as given below:

using System.Net.Security;

using System.Security.Cryptography.X509Certificates;

Note: Also, sometime you may not get the content due to some reason. The reason may be, requesting header required the Referrer or cookie. For this, you have to add the Referrer (if Referrer required). Or also, maybe, you have to add the cookie. Following is the syntax:

myWebRequest.Referer = “”; // (put the referrer between inverted quote).

myWebRequest.Headers[“Cookie”] = “”; // (put the Cookie between inverted quote).

Therefore, you can refer to details about the header here.

HTTP request with POST method using C#

In addition, sometimes need to get the content using POST method. To do this, you can use the following code.

public string _GetHtmlSource1(string url, string cookie, string Refrer, string PostData)
{

}

How to find the post string?

To do this, you have to analyze the requested URL from chrome or Firefox browser so that, you can see the post string from there. Simply copy and paste in the method and assign the value to the variable postString.

Maybe also, the same thing you will have to do seems like GET method. Most probably you may need to add the referrer, cookie or other headers also.

Following is the content image:

What are headers and its type?

There are four types of HTTP message headers:

  • General header
  • Client Request header
  • Server Response header
  • Entity-header

1.  General header

For both request and response, messages applicability have the general header fields. Of course which do not apply to the entity being transferred. Followings are the example of headers which are used as the general header are given below:

Cache-Control

Following is the syntax and example:

Syntax:

Cache-Control : cache-request-directive|cache-response-directive.

For example:

Cache-control: no-cache.

Connection

Following is the syntax and example:

Syntax:

Connection : “Connection”

For example:

Connection: close

Date

Following is the example:

For example:

Sun, 06 Jan 2015 08:49:37 GMT ; RFC 822, updated by RFC 1123.

Sunday, 06-Jan-94 08:49:37 GMT ; RFC 850, obsoleted by RFC 1036.

Sun Jan 6 08:49:37 2015 ; ANSI C’s asctime() format.

Pragma

Following is the example:

For example:

Pragma: no-cache

Trailer

Following is the syntax:

Syntax:

Trailer: field-name

Transfer-Encoding

Following is the syntax:

Syntax:

Transfer-Encoding: chunked.

Upgrade

Following is the syntax:

Syntax:

Upgrade: HTTP/2.0, SHTTP/1.3, IRC/6.9, RTA/x11.

Via

Following is the example:

For example:

Via: 1.0 fred, 1.1 nowhere.com (Apache/1.1)

Warning

Following is the example:

For example:

Warning: warn-code SP warn-agent SP warn-text SP warn-date.

2. Client Request header

Client request header fields are used send the request to the server. This is only applicable for request messages.

Following are few examples of this header are given below:

Accept

Following is the syntax and example:

Syntax:

type/subtype [q=qvalue].

For example:

text/plain; q=0.5, text/html, text/x-dvi; q=0.8, text/x-c).

Accept-Charset

Following is the syntax and example:

Syntax:

Accept-Charset: character_set [q=qvalue].

For example:

Accept-Charset: iso-8859-5, unicode-1-1; q=0.8.

Accept-Encoding

Following is the syntax and example:

Syntax:

Accept-Encoding: encoding types

For example:

Accept-Encoding: compress, gzip or Accept-Encoding: or Accept-Encoding: * or Accept-Encoding: compress;q=0.5, gzip;q=1.0 or Accept-Encoding: gzip;q=1.0, identity; q=0.5, *;q=0.

Accept-Language

Following is the syntax and example:

Syntax:

Accept-Language: language [q=qvalue]

For example:

Accept-Language: da, en-gb;q=0.8, en;q=0.7.

Authorization

Following is the syntax and example:

Syntax:

Authorization: credentials

For example:

Authorization: BASIC Z3Vlc3Q6Z3Vlc3QxMjM=

Cookie

Following is the syntax and example:

Syntax:

Cookie: name=value

For example:

Cookie: name1=value1.

Thus, if we need to use multiple cookies then we can be specified separated by semicolons. Following is the syntax:

Cookie: name1=value1;name2=value2;name3=value3.

Expect

Following is the syntax:

Syntax:

Expect : 100-continue | expectation-extension.

From

Following is the example:

For example:

From: webmaster@w3.org.

Host

Following is the syntax:

Syntax:

Host : “Host” “:” host [ “:” port ] ;

A host without any trailing port information implies the default port, which is 80. For example, a request on the origin server for http://www.w3.org/pub/WWW/ would be:

GET /pub/WWW/ HTTP/1.1

Host: www.w3.org

If-Match

Following is the syntax and example:

Syntax:

If-Match: entity-tag

For example:

If-Match: “xyzzy” or If-Match: “xyzzy”, “r2d2xxxx”, “c3piozzzz” or If-Match: *.

If-Modified-Since

Following is the syntax and example:

Syntax:

If-Modified-Since: HTTP-date.

For example:

If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT.

If-None-Match

Following is the syntax and example:

Syntax:

If-None-Match: entity-tag

For example:

If-None-Match: “xyzzy” or If-None-Match: “xyzzy”, “r2d2xxxx”, “c3piozzzz” or If-None-Match: *.

If-Range

Following is the syntax and example:

Syntax:

If-Range : entity-tag | HTTP-date.

For example:

If-Range: Sat, 29 Oct 1994 19:43:31 GMT.

If-Unmodified-Since

Following is the syntax and example:

Syntax:

If-Unmodified-Since: HTTP-date.

For example:

If-Unmodified-Since: Sat, 29 Oct 1994 19:43:31 GMT

Max-Forwards

Following is the syntax and example:

Syntax:

Max-Forwards: n

For example:

Max-Forwards: 5

Proxy-Authorization

Following is the syntax:

Syntax:

Proxy-Authorization: credentials.

Range

Following is the syntax and example:

Syntax:

Range: bytes-unit=first-byte-pos “-” [last-byte-pos]

For example:
– The first 500 bytes

Range: bytes=0-499

– The second 500 bytes

Range: bytes=500-999

– The final 500 bytes

Range: bytes=-500

– The first and last bytes only

Range: bytes=0-0,-1

Referer

Following is the syntax and example:

Syntax:

Referer : absoluteURI | relativeURI.

For example:

Referer: https://www.google.co.in/?gfe_rd=cr&dcr=0&ei=xZm0WuilI5OdX9uniegC.

TE

Following is the syntax and example:

Syntax:

TE: t-codings.

For example:

TE: deflate or TE: or TE: trailers, deflate;q=0.5.

User-Agent

Following is the syntax and example:

Syntax:

User-Agent: product | comment.

For example:

User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT).

3. Server Response header:

Also, when we request on the server with HTTP headers it returns respond. So Basically these headers are used to retrieve the messages from the server.

Followings are the few Server Response headers and its syntax:

Accept-Ranges

Following is the syntax and example:

Syntax:

Accept-Ranges: range-unit | none.

For example:

Accept-Ranges: bytes.

Age

Following is the syntax and example:

Syntax:

Age : delta-seconds.

For example:

Age: 1030

ETag

Following is the syntax and example:

Syntax:

ETag : entity-tag.

For example:

ETag: “xyzzy” or ETag: W/”xyzzy” or ETag: “”.

Location

Following is the syntax and example:

Syntax:

Location : absoluteURI

For example:

Location: https://www.google.co.in/?gfe_rd=cr&dcr=0&ei=xZm0WuilI5OdX9uniegC

Proxy-Authenticate

Following is the syntax:

Syntax:

Proxy-Authenticate: challenge.

Retry-After

Following is the syntax and example:

Syntax:

Retry-After : HTTP-date | delta-seconds.

For example:

Retry-After: Fri, 31 Dec 1999 23:59:59 GMT.

Retry-After: 120.

Server

Following is the syntax and example:

Syntax:

Server: product | comment

For example:

Server: Apache/2.2.14 (Win32)

Set-Cookie

Following is the syntax and example:

Syntax:

Set-Cookie: NAME=VALUE; OPTIONS.

For example:

Set-Cookie: name1=value1,name2=value2; Expires=Wed, 09 Jun 2021 10:18:14 GMT

Vary

Following is the syntax and example:

Syntax:

Vary: field-name

For example:

Vary: Accept-Language, Accept-Encoding

WWW-Authenticate

Following is the syntax and example:

Syntax:

WWW-Authenticate : challenge

For example:

WWW-Authenticate: BASIC realm=”Admin”

4. Entity-header:

Request and Response messages may transfer an entity. Otherwise, if not, its restricted by the request method or response status code. An entity consists of entity-header fields and an entity-body. Although some responses will only include the entity-headers. Following are few examples of this header:

Allow

Following is the syntax and example:

Syntax:

Allow: Method

For example:

Allow: GET, HEAD, PUT

Content-Encoding

Following is the syntax and example:

Syntax:

Content-Encoding : content-coding

For example:

Content-Encoding: gzip

Content-Language

Following is the syntax and example:

Syntax:

Content-Language: language-tag

For example:

Content-Language: mi, en

Content-Length

Following is the syntax and example:

Syntax:

Content-Length : DIGITS

For example:

Content-Length: 3495

Content-Location

Following is the syntax and example:

Syntax:

Content-Location: absoluteURI | relativeURI

For example:

Content-Location: http://www.tutorialspoint.org/http/index.htm

Content-MD5

Following is the syntax and example:

Syntax:

Content-MD5: md5-digest using base64 of 128 bit MD5 digest as per RFC 1864

For example:

Content-MD5 : 8c2d46911f3f5a326455f0ed7a8ed3b3

Content-Range

Following is the syntax and example:

Syntax:

Content-Range : bytes-unit SP first-byte-pos “-” last-byte-pos.

For example:

– The first 500 bytes:

Content-Range: bytes 0-499/1234.

– The second 500 bytes:

Content-Range: bytes 500-999/1234.

– All except for the first 500 bytes:

Content-Range: bytes 500-1233/1234.

– The last 500 bytes:

Content-Range: bytes 734-1233/1234.

Content-Type

Following is the syntax and example:

Syntax:

Content-Type : media-type

For example:

Content-Type: text/html; charset=ISO-8859-4

Expires

Following is the syntax and example:

Syntax:

Expires: HTTP-date

For example:

Expires: Thu, 31 Dec 2015 16:00:00 GMT.

Last-Modified

Following is the syntax and example:

Syntax:

Last-Modified: HTTP-date

For example:

Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT.

Also, If you need the explanation of each and every header click here.

Concluding Words

I am sure, after going to this blog, you will be able to understand the headers and its definition. Not only headers and its definition, but also you will be able to get the content through C# code. Also, by now, you must be having a clear understanding of GET and POST method.

Anythough for further queries, feel free to place your valuable comments in the comments section below!

RECOMMENDED POSTS

Start typing and press Enter to search