Skip to content

Node.js program hosted on Heroku which uses the Reddit API to constantly read from all comments. Reads from a PostgreSQL database to find which comments to look for, and send POST requests when found

Notifications You must be signed in to change notification settings

LionelBergen/reddit-comment-reader

Repository files navigation

Reddit Comment Reader

Reads comments from all of reddit and picks out phrases, then sends any found matches to either 'localhost' if local, or another heroku application depending on if application is run locally or on heroku

npm run start - Starts the program
npm run test - Runs tests, not including 'live' tests, which require environment variables filled with tokens.
npm run eslint - Used to keep consistent format. This should be pass before every commit

Quick Start

  1. Have postgresql service running
  2. Ensure you have a database user postgres with password postgresql (Or modify the batch file below to correct username/password)
  3. Run reddit-comment-reader\database\create_local_database.bat or .sh for linux This will drop the database if it exists and recreate it
  4. Create environment variables DATABASE_URL, OUTPUT_URL, REDDIT_USERNAME, REDDIT_PASSWORD, REDDIT_APP_ID, REDDIT_APP_SECRET. Or create an .env file with these values. You can look at example.env for an example.
  5. Install dependencies by running npm install
  6. Start the application by running npm run start

Database Connection

Database connection is expected to be contained in an evironment variable 'DATABASE_URL'

Example: SET DATABASE_URL=postgres://postgres:postgresql@localhost:5432/reddit_comment_reader

Note on windows I get an error when setting the above, but it works regardless of error

Database Tables

RegexpComment - Phrases to look for are taken from the PostgreSQL database

SubredditMatch CommentMatch ReplyMessage IsReplyRegexp id
RegexpComment Creation script
-- Table: public."RegexpComment"
-- DROP TABLE public."RegexpComment";

CREATE TABLE public."RegexpComment"
(
	"SubredditMatch" text COLLATE pg_catalog."default" NOT NULL DEFAULT '.*'::text,
	"CommentMatch" text COLLATE pg_catalog."default" NOT NULL,
	"ReplyMessage" text COLLATE pg_catalog."default" NOT NULL,
	"IsReplyRegexp" boolean DEFAULT false,
	id integer NOT NULL DEFAULT nextval('"RegexpComment_id_seq"'::regclass)
)
WITH (
	OIDS = FALSE
)
TABLESPACE pg_default;

ALTER TABLE public."RegexpComment"
	OWNER to uuhsiyqcwwsszg;

ErrorTable - Errors are logged here. Application is hosted on Heroku, which doesn't keep a second log for errors

id ErrorDescription ErrorTrace AdditionalInfo CreatedOn
ErrorTable Creation script
-- Table: public."ErrorTable"
-- DROP TABLE public."ErrorTable";

CREATE TABLE public."ErrorTable"
(
	id integer NOT NULL DEFAULT nextval('errortable_id_seq'::regclass),
	errordescription character varying(255) COLLATE pg_catalog."default",
	errortrace character varying(5000) COLLATE pg_catalog."default",
	additionalinfo character varying(1000) COLLATE pg_catalog."default",
	createdon timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
	CONSTRAINT errortable_pkey PRIMARY KEY (id)
)
WITH (
	OIDS = FALSE
)
TABLESPACE pg_default;

ALTER TABLE public."ErrorTable"
	OWNER to uuhsiyqcwwsszg;

Reddit API connection

This program does not use an authenticated client. Since none is required for reading data from Reddit's api.
Since Reddit's 2023 API changes, this program now uses an Authenticated client. Also the number of comments that can be retrieved has lessened.

Uses an https client require('https') to make requests to 'reddit.com/all/comments.json' and occassionally 'reddit.com/subreddit/moderators.json'.

Sending data

Uses Faye require('faye') to send data to either another heroku application, or localhost.com when comments are found matching regular expressions taken from the database

Other notes

Program is hardcoded to ignore moderator comments. Done by querying the URL for the appropriate subreddit. A variable is maintained and requests to a single subreddit are only made once per program duration

Makes a request to a Reddit URL every 1100 milliseconds. Reddit may block connections that make requests less than 1000 milliseconds and I've found using that exact limit causes issues
Reddit as of 2023 has silently changed the number of comments that can be retrieved.

Ignores comments from blacklisted subreddits. Some serious subreddits are hardcoded to be ignored, such as /r/depression

Doesn't post the same comment to the same subreddit too many times within a duration

About

Node.js program hosted on Heroku which uses the Reddit API to constantly read from all comments. Reads from a PostgreSQL database to find which comments to look for, and send POST requests when found

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages