SBlocker
Email Spam dector and classifier
|
Sblocker Generates vocabulary, corpus and classifies a text into different types of data.
Generates vocabulary, corpus, learns and classifies a text into different types of data.
Runs the program with the needed flag for generating vocabulary (-v || –vocabulary), generating corpus (-co || –corpus), learning probabilities (-l || –learner), printing user manual (-h || –help), classifying test data (-cl || –classify), calculating success and error percentage (-e || –error), updating emails database (-u || –updateDatabase), generating tests files (-t || –generateTest).
-v (--vocabulary) generates the vocabulary from the input file into the output file, excluding words in reserved words file. Example: $ Sblocker -v inputFile outputFile rwFile Definition: inputFile File path with the full data that wants to be processed. outputFile File path were the program will store the solution. rwFile File path were the reserved words are stored. -co (--corpus) generates one different corpus per each data type received as argument after the input file and the reserved words file. Example: $ Sblocker -co inputFile rwFile Corpus1 Corpus2 ... CorpusX Definition: inputFile File path with the full data that wants to be processed. rwFile File path were the reserved words are stored. CorpusX Name of the data type that wants to be searched and generated corpus at. The input file must have them on the first column and separating the rest of the line with a ",", as if it were at the first column of a csv. -h (--help) prints the user manual for the program. Example: $ testClassifier -h -l (--learner) calculates the proabilities for each token for each corpus and stores it into files named as learned_CORPUS.txt on the output folder. Example: $ Sblocker -l vocFile Corpus1 Corpus2 ... CorpusX Definition: vocFile Vocabulary file generated with --vocabulary option CorpusX Name of the data type that wants to be searched and generated corpus at. The input file must have them on the first column and separating the rest of the line with a ",", as if it were at the first column of a csv. -cl (--classify) classifies the given input into the created classes readed from the learned data. Example: $ Sblocker -cl corpusTest rwFile Learn1 Learn2 ... LearnN Definition: corpusTest File path which has one sentence thata is going to be classified on each line. rwFile File path were the reserved words are stored. LearnN File path with the learned output data from using -l option. -e (--error) calulates the error and success percentage of the classification. Example: $ Sblocker -e resumeData expectResume Definition: resumeData File path to the resume generated by classifier. expectData File path to the expected data, it must be with the same syntax as resumeData (one line per sentence class and in same order as origin corpusTest). -u (--updateDatabase) Updates the emails database by adding all the emails in the folder inputs/GenerateMails to database. -t (--generateTest) Generate the test files for the program, takes 20% random data from mail-train.csv for geting this tests.
Returns success unless an error occurs
Written by Fabio Ovidio Bianchini Cano, Óscar Hernández Díaz & Adrián Epifanio Rodríguez Hernández.
Report an issue at https://github.com/AdrianEpi/SBlocker