commit d2eb7efa3807cf98ccad45860df164e5df7b4868 Author: Tamas Szirtesi Date: Mon Oct 9 14:19:20 2023 +0200 Added README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..a551faa --- /dev/null +++ b/README.md @@ -0,0 +1,36 @@ +# Help Center Spider +## About +This is a spider tool with which you can visit all links on https://docs.otc.t-systems.com to find urls that are not correct. + +## Requirements +After you cloned the repository you need to prepare an environment to run the tool. You can easily do this with +python virtual environment: + +``` +$ cd / +$ python -m venv venv/ +$ python -m pip install -r requirements.txt +``` + +## Configuration +In _config.json_ you can define a couple items: + +- _watchdog_file_: if you run the tool in the background and want to stop it properly (not using `kill`), +just send an exit message into the watchdog file: `echo exit > watchdog.fifo` +- _timer_runtime_: maximum runtime limit in seconds +- _log_dir_: logging folder +- _logging_interval_: frequency of dumping log files +- _workers_: number of workers (background processes) you want to run. If you set to 0 it will count from the number of cores (_number_of_cores_ - 1) +- _starting_point_: base url where to start + +## How to run +There are two ways to do it + +### In foreground +`$ python main.py` + +### In background +`$ nohup python main.py > log/hc_spider.log 2> log/hc_spider.err <&- &` + +In case you running the tool in background you can stop the execution with `$ echo exit > ` +