Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The first step in our example is to prepare the browser proxy so that all traffic to and from Google is successfully routed via the Composable Architecture Platform Proxy Server. This will give us visibility of the data and provide all of the information we need to manipulate it.
Many browsers have in-built security features to prevent user access to websites whenever there is an untrusted SSL certificate, and will block the incoming request without exception
In our example, because it is not possible to install Google’s SSL certificate to the Proxy Server, overcome this by using redirection settings within the Proxy Server. In Administration, Server Definitions click on the Proxy Server as follows.
Click on the Forwarding tab and set the Request redirection properties for Google as follows. Our example is for a UK IP address request, which follows the redirect of Google.com to Google.co.uk based upon the IP geolocation from the originating browser.
The first line entry is for example format use only and has no impact on the Proxy Server:
http://thishost>http://thishost:8001
http://google.com>https://google.com
http://www.google.com>https://www.google.com
http://google.co.uk>https://google.co.uk
http://www.google.co.uk>https://www.google.co.uk
Once you have input the redirection settings, scroll to the bottom of the page and save the modified Proxy Server definition.
The Proxy Server will now successfully route the http to https protocol redirection and allow the browser to access the website even without a correct SSL certificate.
Next, deploy a configuration to the Proxy Server. The configuration we will use in this example is the one named BasicWebTrial, which is under Configurations->Product Trial in the administration tree:
When you click on it, you will be presented with a number of options:
At this stage we are not going to make any changes to the configuration, only the changes made earlier to the Proxy Server server definition.
So now deploy it and start the Proxy Server by clicking on Deploy.
You will see a choice of servers you can deploy to:
Select the Proxy Server as shown, check Restart immediately and then click Deploy. You will then see the action window switch to the server view showing the configuration and all of its dependencies being deployed to the proxy:
Once complete, you will see that the Proxy Server is started and ready to use:
Now that the Proxy Server is running, the browser needs to be configured. There are a number of different ways of doing this, depending on the browser of your choice.
Our preferred method is to use one browser (e.g. Chrome or IE) for managing the console and another browser (e.g. Firefox) for browsing via the Proxy Server. The advantage of this approach is that Firefox has its own local proxy settings allowing us to run basic queries and other web browsing unrelated to our testing in the non-proxied browser.
Note: When using the Composable Architecture Platform browser proxy for accessing secure web sites over HTTPS, you will encounter certificate warning in the browser. These warnings are relatively easy to get around by clicking on the Advanced button and adding an exception. However, with the advent of HTTP Strict Transport Security (HSTS) this has now become impossible to do as the browser will refuse to add the exception.
The Browser Certificate Installation Guide (in the documents folder) provides instructions on how to overcome this problem by installing a trusted certificate authority into your browser that Composable Architecture Platform in turn will use to generate valid replacement certificates for each SSL site on the fly.
The following shows how to configure the browser proxy in Firefox Quantum 60.0 on Windows 2012 Server:\
Select Options then click on Network Proxy > Settings
:
Set the proxy options as shown below:
Every X Engine receives data in the form of variables. These variables are initially supplied by an input adaptor. The most commonly used input adaptor receives web application input, but other adaptors receive XML data, CSV data or other more complex input.
For the purpose of understanding the above rule set, the web application input adaptor supplies the variables REQUEST_URL, URI and REQUEST_TIMESTAMP. It also supplies as variables any parameters provided by GET
or POST
requests. To obtain more detailed information about the HTTP request, the HTTP Request Tracker rule is used
The reason for this separation is that you may not need all of the detailed information for most requests (such as images). This example provides a quick window into the world of Composable Architecture Platform. The next step is to create a configuration that will have a more interactive result.
Start by actually running a query. For the purpose of this example, go to www.google.com and query the word dishwasher. You will get a country specific page similar to the one shown below. If you don’t see any ads at the top or on the right look at the bottom of the page. In our example, we are using www.google.co.uk.
The goal is to remove the ads along the top, and the ads along the right-hand side.
The next step is to work out how to go about removing the ads.
It is now time to take a closer look "under the hood" to give you an understanding of what just happened. The first thing to look at is the configuration that we just deployed. Select it again for a closer look:
Configurations are what tie a solution together.
Each solution consists of a number of building blocks which can include several rule sets, data files, content files, database configurations, field settings, input source definitions and much more:
To learn what this configuration does, you can review each of the various tabs and look at each rule set. Alternatively, click on Document, and select a target server:
Select the Proxy Server and click on Document.
A new page will appear that contains a complete summary of the configuration:
This page is specifically designed for printing a given configuration for audit purposes, but it is also an excellent way to get a quick understanding of what is going on in a rule set. Just focusing on the rules in this case, scroll to the bottom of the document:
The rule set shown (BasicWebLister) is executed whenever a request is sent from the browser to the server. The rule set is effectively a flow chart, executing from the green dot on the left through the rules towards the right. This is a very simple rule set with no decisions, so the flow should be very clear.
The summary page below the rule set shows the properties set for each rule, but for the sake of understanding, we will elaborate a little further:
The first rule executed is the HTTP Request Tracker rule. This rule takes a basic HTTP request and extracts all of the common header attributes from it (header names, request URL, tracking cookies etc.) and places that information in variables. It also sets tracking cookies (if Use cookies is set to Yes).
The second rule is the MaxMind Geo Info rule. It uses the IP address supplied on the HTTP request and attempts to convert it to a physical location (country and city) using the MaxMind Geo Location database. In this case, the rule returns nothing, as the localhost IP address (127.0.0.1) doesn't resolve to any country in the data lookup.
Finally, the List Variables rule sends all of the variables that have a value to the server console viewer so that the user can examine them, which is what you saw earlier.
The purpose of the configuration we just deployed and tested is to obtain the HTTP request data, augment it with Geolocation and then send the information to the console. If you scroll down the server console viewer, you will notice the various requests coming in, including requests for images, style sheets, icons and so on.
The first thing that is required for a new configuration is a new repository. All data, rules, content and so on, live within Composable Architecture Platform in a repository.
To create a new repository, click on Repositories, enter the name as “Google Ad Remover” and click Create.
This will create the repository. The next step is to figure out what our rules should do. This requires a closer look at what Google does with their search results.
We now have our condition in place, and we can connect it up to the data flow. All data arrives into a rule set from the green dot:
To connect the If Condition to the incoming data, click the ?
image on the rule and drag a line to the green dot and then release. An arrow will appear:
All rules work this way. They get their input on the left and exit through one or more "chain points" on the right.
This rule will be used to filter out all the non-search content. As mentioned earlier the web application input adaptor provides the variable URI.
This variable contains just the path part of the request (without the hostname) and is very suitable for this test.
So, click on the If Condition rule. You will see the rules editor change to show the properties for the rule on the left-hand side:
At the very top of the list is always the Label, Rule Class and Description. Label and Description are the short and long descriptions, respectively. The label defaults to the rule name (If Condition in this case). The label is the rule name given in error messages if a problem occurs while starting or executing a rule set. If required, you can change the label to be any short text that you can use to identify the rule.
For each rule, you can also set a description. This should be a short note explaining what the rule is supposed to do in the context of where it is placed.
For now, complete the properties as follows:
You may have noticed that the Value property was entered as “.”
. In general values within the rules editor are treated as follows:
Note: Sometimes it can be difficult to know if a property value requires a constant (no quotes required) or a value (quotes, a number or a variable required). To assist you in knowing what to put, property values are light orange input fields, whereas constants are white.
Value
Example
Meaning
Number
1234
A decimal number that can be used for calculations.
“Text”
“Hello World!”
Text is always enclosed in double quotes. If not, it will be treated as a variable (see below).
Variable
FROM_ACCOUNT
A variable is a field that contains data. It can be numeric or text.
By convention variables should be typed in UPPER CASE, however, this is not enforced.
Variables may not have commas or double quotes in their name.
Array variable
HEADERS
An array variable is essentially a text variable formatted to contain keyed arrays in a format that is readily recognized by applications and browsers (JSON). There are rules available to convert between JSON and CSV formats too.
CSV
A,B,C
A list of values separated by commas. If the values are strings, double quotes around them are not required (unless they have a comma in them).
Now we can verify the browser and proxy configuration. In the browser you chose for browsing via the proxy, type www.google.com in the URL (address) bar and hit enter. You will see the country specific main Google page:
Now switch back to the browser running the console. You should see some activity in the server console viewer. You can enlarge the server console viewer to get a better look:
Without going into too much detail at this stage, what you are seeing is the browser request for each interaction that the browser had with the requested host. You can see items such as the IP addresses, User agent (Browser), Request URL, request method, cookies, protocol scheme etc. This is by no means an exhaustive list of the data Composable Architecture Platform can detect but gives you a general idea.
The thing to take note of at this stage is that you can see all requests, including requests for images as well as JavaScript, CSS and other page elements. This is an important thing to be aware of when writing rules.
The next step is to change the actual server response before it is sent to the user. In our case this change consists of the html string replacement we identified earlier. The rule for a string replacement is called String Replacer. Locate it and drag and drop it onto the canvas. How to connect it up should be easy now:
Notice that we connect both the Found and NotFound chain points to the following rule.
We do this because not all Google pages display ads. This time the properties are set as follows:
There are a number of different ways to work out what actions need to be performed in the rules. In this case, the only action is to alter the response, so we need to determine where to make changes. Browsers like IE, Chrome and Firefox all provide developer tools to help identify specific elements in the page source code. In all of those browsers hit F12 to access the debugging tool if using Windows. For other platforms, please check the browser help instructions for how to access the tool. They all work in a similar fashion, but we will just cover Firefox Quantum version 60.0 operating on Windows 2012 Server in this example.
Click on F12 to open up the Inspector:
Click on the html inspector tool:
Now select the sponsored ads box:
This is where it is useful to know HTML, especially when dealing with a multinational site such as Google, as the tags tend to change from country to country. In our example, it is worth noticing that there are various advertising tags output within the source of the page.
There is a DIV with ID “rcnt”. To make the ads disappear you need to hide the tag using inline css styles.
To accomplish this:
becomes:
With this information to hand, the next step is to start building a rule set.
IMPORTANT NOTE: Individual versions of Google will differ depending upon operating system, browser, and country. Make sure to work out the right way to make this modification in the version being used.
Our rule set is now complete, and the next step is to get it running on the Proxy Server. This requires a configuration. Configurations provide all of the instructions for how rule sets obtain their input, how they connect to databases, under what circumstances rule sets are run and so on.
To create a new configuration, click on Configurations in the console administration tree:
The create new configuration page is shown. Select your Google Ad Remover repository, enter the file name RemoveAds and a short description:
Click Create to create the configuration. You will notice that the configuration automatically selects the NoAds rule set. However, if you have multiple rule sets you will need to ensure that you set the correct one.
Normally the starting point for a new rule set is using the New rules wizard. We will cover that later, but for the purpose of simplicity this exercise will instead build a new rule set from scratch. Return to the console and click on Rule Sets:
In the action window select your new repository and give the rule set a name (in this case NoAds) and click on Create:
Note: The rule set name should always be a single word with no spaces.
A new rule set is created, ready for us to edit:
Click on Update to start editing the rule set. A pop-up window will appear showing the rule set in the Rules Editor:
Note: If no pop-up appears then check your browser's pop-up blocker. Pop-ups (though blocked by some users) are useful for the rules editor. It allows you to have many rules editing windows open at the same time and edit them all concurrently (including copying and pasting between them).
We encourage you to expand some of the elements of the Rule Catalog to see what is available. The complete rules reference is also available as a PDF document from the main console page.
At this stage you should also add a short description of what your rule set is going to do. Do this by clicking on the Rule Info tab and keying in a short description of the purpose of the rule set:
The next step is to start building some rules to handle the search result. The first consideration is that the rules should only apply to search results, not items like images, CSS and the like. Normally the New rules wizard would insert a special rule to take care of that problem, but with Google there happens to be a very simple solution: Any request that has a dot (.) in it, is sure to be non-HTML.
Other sites may use some other consistent extension for pages (such as .php, .jsp or .html), but for Google it is pages with no extension at all.
Therefore, our first task is to filter out all requests with a dot (.) in them, and to do this, we need a condition. Expand the Conditions group and drag an If Condition from the tree onto the canvas:
At this point, move your mouse over the If Condition on your canvas, right click it and select Help. The expanded help for the rule appears:
All rules have this help available. In addition, in the bottom left corner of the rules editor you will also see a summary help notice:
These help features are often useful when trying to find the best rule to suit a specific purpose (as some rules may sound very similar).
The next step is to forward the request from the browser to Google's server so that we can get a result to work with. The rule to use for this is called HTTP Server Execute. Rather than trying to locate the group the rule is in, this time we will search for it. In the rule editor, click the Searchable tab:
In the search box type execute. The search list updates for each character typed and quickly locates the rule as shown:
Once again, drag the rule onto the canvas and then connect the False (URIs that do NOT contain a ".") chain point from the If Condition rule to the input of the HTTP Server Execute rule:
Once again set the properties as follows:
What we are doing here is requesting the response from the server. The response will be loaded into the RESPONSE
variable name; the content type of the response will be supplied in the CONTENT
variable. We have chosen not to override any part of the request (although in theory you could override the "dishwasher" query to be something else entirely)
Finally, we have elected to obtain the headers and status code from the server as well. We will need all of this information later to send back a proper response to the user.
Our configuration is almost complete and ready to deploy. The last step is to define the input source for the X Engine. Click on the Input Source tab and set the From Server Type to Production, and the Source of data to Receive web application input:
Now that your console is fully operational, we are ready to take you through a basic example that illustrates how to use it. Our example will show you how to remove all advertising from Google search results.
Even though this example has limited real world practical use (unless you wish to run it on your corporate internet gateway), it provides a basic case study that shows many fundamental features.
The final step in this case is very simple. We go back to the Firefox browser and hit refresh on the Google search. Remember that before the result looked like this:
And after the refresh it now looks like this:
So, with a bit of preparation and just 4 rules, we have transformed the Google search result.
We have only one final step to complete the rule set, we must return the changed response to the user. This is done with the HTTP Response rule:
The final properties are set:
Our rule set is now complete, so save and exit the editor.
IMPORTANT: If you are using Google Chrome to edit the rules make sure to hit the Save button in the rules editor before closing the pop-up window.
Our configuration is now complete, and we can deploy it to the Proxy Server to test it. Within the configuration page, click on Deploy:
Clicking Deploy does two things:
The configuration is automatically saved
The configuration is automatically verified for errors
If everything is OK, you will see the server selection window. Just like you did with the BasicWebTrial configuration, select the Proxy Server, check the Restart immediately box, and click on Deploy.
Once again you will see the deployment process followed by the status screen showing ready:
Preparing the Browser Proxy
Setting up the Proxy in the Browser
Verifying the Browser Configuration
Understanding the Configuration
Understanding input and variables
Preparing a New Repository
Locating the Page to Modify
Determining the Actions Required
Building the First Rule Set
Setting Rule Properties
Connecting up the First Rule
Getting a Server Result
Manipulating the Server Result
Returning the Result to the User
Creating a Configuration for the Rule Set
Selecting the Input Source
Deploying the New Configuration
Testing the Rules