A web crawler capable of traversing any site with custom environmental variables.
Navigate to the repo where you would like to run Cruller. And run the following command to install Cruller into your project.
npm i cruller
Ensure you have a testing suite installed if you would like to use one, as Cruller is agnostic to how it is used and will not provide one. Be sure to run npm init
and ensure that the test command matches your test suite of choice. No other fields are required.
Run the following command to generate a template project structure, containing sample pages and tests in the root of where your automated test suite will be located.
npx gen-fs
A file called cruller.config.js will have been created. Within this file many parameters are set.
Any number of environment variables can be set for your crawls. You are able to include as many instances of each variable as needed. By default Cruller will perform one crawl pass for each permutation if your environment variable combinations. By default Cruller includes two in every project:
baseurl
: the origin landing page of your project.breakpoints
: size breakpoint between devices.
Location of all page stamps. Using the gen-fs command will point this to the default directory. However, if you prefer you can keep your stamps elsewhere.
Location of your browserless installation. Leave blank if none is being used.
Define any steps required to configure the startup state of your crawler.
By default the configuration of baseurl
and breakpoints
will be defined here.
Be sure to set up your custom environment variables.
A function that contains custom instructions for specific permutations of environment variables which do not occur normally.
All axe
configurations are supported and can be set for your crawls. Refer Axe-Configurations for possible options.
This is to be used for setting axe report
configurations for your crawls. By default, Cruller allows setting -
- Report file type : Supported file types are
tsv
andcsv
. - Report file name : Report will be generated with this file name.
- createNewFile flag: Boolean flag to be set as
true
if new report file is required per page, else should be set asfalse
. In later case, flag value should be explicitly sent astrue
inaccessibilityCheck
method when calling method for first time.
await [project name].accessibilityCheck([project name].page, true);
Stamps are how Cruller creates page objects and methods. Every page Cruller runs on will have stamps defining its props and methods.
-
Every page must export at least a Base Page. For example, the Home page export would look like this:
module.exports = { homeBase };
-
The base stamp is the collection of props and methods that will be used on the given page by default, across all permutations of your enviornment variables, unless overwritten for specific environment variables with a seperate stamp.
-
When naming Page Stamps, follow the naming convention for base stamps
lowercase page name + Base
. For example, the Home Page base stamp would be namedhomeBase
. -
If stamps for specific environment variables are needed follow the following naming convention
lowercase page name + [environment variable name]
. For example, the Home Page Mobile stamp would be namedhomeMobile
. -
Stamps for specific environment variables will overwrite any props or methods with the same name provided in the base stamp. This is especially useful if the same prop or method requires different inputs depending on the environment, as your crawl will pick the correct stamp depending on the permutation.
-
When exporting more than one page stamp per page, append any additional pages to the export statement, as follows:
module.exports = { homeBase, homeMobile };
-
There is no need to include a separate Page Stamp for any environment variable that has the same properties and methods as your base stamp.
-
To see example stamps see
stamps/pages/home.js
andstamps/shared/shared.js
.
Contains page stamps for individual pages.
- Any stamps created must be included in the
index.js
file of that directory
WIP
: Contains stamps usable across several pages on the site.
Contains stamps usable on every page on the site.
Props are how stampit adds properties to your Page Object. Every prop on a page will be a key value pair which can be used by the Page's methods.
- To increase the readability of your methods it is recommended every CSS selector used for your methods is given a descriptive prop. Be sure to indicate the type of selector being used with the proper notation (
#
for an ID, and.
for a class). A prop calledlink
for the classlink_to_homepage
would be set using:
link : '.link_to_homepage'
Methods are stampit functions associated with your page object. Since this a web crawler, it is recommended that only asynchronous functions are used, with each step being preceded by an await
to ensure your methods execute in the correct order. All Puppeteer functions are available.
- Ensure that Puppeteer functions, and any props they use reference the Page Object. Therefore the proper syntax for a statement in which the
link
prop is clicked would be the following:
await this.puppeteerPage.click(this.link);
-
However, you are able to create statements which use non-Puppeteer functions. Cruller includes 5 helper methods available within every stamp:
-
visit()
: visits the page of thebaseUrl
and appends a string of the proppageUrl
(which you must define on each page) to the end of the url. -
waitClick(prop)
: waits for prop and then clicks once the prop is visible. -
waitClickNavigate(prop)
: waits for prop, clicks once the prop is visible, and waits for navigation on the page to complete. -
emptyField(prop)
: deletes all text content of specified prop. -
clickByIndex(prop, index)
: clicks on particular instance or a given prop.
-
-
For these, and any other non-Puppeteer functions, a different syntax is used. The following would click the same
link
prop used in prior examples using thewaitClick()
function:
await this.waitClick('link');
-
Cruller can support any test suite, however it was created with running Jest in mind. Jest test files are required to have the structure
testname.test.js
. Be sure to install Jest in your project if you decide to use it, as it is not included with Cruller. -
Every test should contain a beforeAll that launches a new instance of the crawler using Chromium and runs the startUp command, which takes two objects as parameters.
- perms: Permutations needed for the test. Baseurl and breakpoint are the two included with Cruller.
- Be sure to include any additional environment variables.
- opts: Puppeteer Connect Options are not required to be included on any given test. If none are provided, Puppeteer's default connection settings will be used.
-
Every test should contain an afterAll that closes the browser.
-
Each test will be a series of steps under the following format until the crawl has completed all steps.
await [project name].[page name]Page.[method];
-
Any set of assertions can be implemented but are not necessary to run tests. Be sure to establish where assertions are kept in your
package.json
file. -
A sample test is provided in
tests/sample.test.js
.
-
Run
npm test
on the command line within the directory your tests are located in. -
You can specify in the command line to run only certain permutations of your tests. The following command would run all tests using only the permutation of Google as your
baseurl
running at a tablet sizedbreakpoint
.
BASEURL="google" BREAKPOINT="tablet" npm test
- We use commitizen to format our commits. Run
npm run commit
after staging your changes to trigger the commitizen cli. Provide the type of change, component, a short description, and the ticket number. The other fields can be left blank.
-
We use axe-puppeteer to test for accessibility and axe-reports to generate accessibility violation report at project root. Supported formats for report are
tsv
andcsv
. -
To run accessibility check, crawl to the desired page. After navigating to page, pass the page reference to
accessibilityCheck
method as following:await [project name].accessibilityCheck([project name].page);
-
accessibilityCheck
method allows for two parameters, namelyexpected page
andcreateNewFile
flag. Flag value defaults to value set inconfig
file, if none provided as argument. -
Sample Jest matchers for accessibility validation can be refrenced at -
__tests__/accessibility.test.js