Skip to content

Commit c8e1292

Browse files
committed
Updated DupliPy
1 parent 00c2d0b commit c8e1292

8 files changed

Lines changed: 170 additions & 8 deletions

File tree

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
name: Publish Python 🐍 distributions 📦 to PyPI
2+
3+
on:
4+
push:
5+
tags:
6+
- '*'
7+
8+
jobs:
9+
build-n-publish:
10+
name: Build and publish Python 🐍 distributions 📦 to PyPI
11+
runs-on: ubuntu-latest
12+
steps:
13+
- uses: actions/checkout@master
14+
- name: Set up Python 3.12
15+
uses: actions/setup-python@v3
16+
with:
17+
python-version: '3.12'
18+
- name: Install pypa/setuptools
19+
run: >-
20+
python -m
21+
pip install setuptools wheel
22+
- name: Extract tag name
23+
id: tag
24+
run: echo ::set-output name=TAG_NAME::$(echo $GITHUB_REF | cut -d / -f 3)
25+
- name: Update version in setup.py
26+
run: >-
27+
sed -i "s/{{VERSION_PLACEHOLDER}}/${{ steps.tag.outputs.TAG_NAME }}/g" setup.py
28+
- name: Build a binary wheel
29+
run: >-
30+
python setup.py sdist bdist_wheel
31+
- name: Publish distribution 📦 to PyPI
32+
uses: pypa/gh-action-pypi-publish@master
33+
with:
34+
password: ${{ secrets.PYPI_API_TOKEN }}

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
__pycache__
2+
*.pyc
3+
dist/
4+
build/
5+
venv/

duplipy/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
import duplipy
2-
from .formatting import remove_stopwords, remove_numbers, remove_whitespace, normalize_whitespace, separate_symbols, remove_special_characters, standardize_text, tokenize_text, stem_words, lemmatize_words, pos_tag
2+
from .formatting import remove_stopwords, remove_numbers, remove_whitespace, normalize_whitespace, separate_symbols, remove_special_characters, standardize_text, tokenize_text, stem_words, lemmatize_words, pos_tag, remove_profanity_from_text, remove_sensitive_info_from_text, remove_hate_speech_from_text
33
from .replication import replace_word_with_synonym, augment_text_with_synonyms, load_text_file, augment_file_with_synonyms, insert_random_word, delete_random_word, insert_synonym, paraphrase, flip_horizontal, flip_vertical, rotate, random_rotation, resize, crop, random_crop, shuffle_words
44
from .similarity import edit_distance_score, bleu_score, jaccard_similarity_score
55
from .text_analysis import analyze_sentiment

duplipy/formatting.py

Lines changed: 63 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,16 @@
1313
- `stem_words(words)`: Stem the input words using Porter stemming algorithm.
1414
- `lemmatize_words(words)`: Lemmatize the input words using WordNet lemmatization.
1515
- `pos_tag(text)`: Perform part-of-speech (POS) tagging on the input text.
16+
- `remove_profanity_from_text(text)`: Remove profane words from the input text.
17+
- `remove_sensitive_info_from_text(text)`: Remove sensitive information from the input text.
18+
- `remove_hate_speech_from_text(text)`: Remove hate speech or offensive speech from the input text.
1619
"""
1720

1821

1922
import string
2023
import re
2124
import nltk
25+
from valx import remove_profanity, remove_sensitive_information, detect_hate_speech
2226
from nltk.corpus import stopwords
2327
from nltk.tokenize import word_tokenize
2428
from nltk.stem import PorterStemmer, WordNetLemmatizer
@@ -244,4 +248,62 @@ def pos_tag(text):
244248
return tagged_words
245249
except Exception as e:
246250
print(f"An error occurred during POS tagging: {str(e)}")
247-
return []
251+
return []
252+
253+
def remove_profanity_from_text(text):
254+
"""
255+
Remove profane words from the input text.
256+
257+
This ensures that text is clean and does not contain inappropriate language.
258+
259+
Parameters:
260+
- `text` (str): The input text to remove profanity from.
261+
262+
Returns:
263+
- `text` (str): The cleaned output text.
264+
"""
265+
sentences = nltk.sent_tokenize(text)
266+
cleaned_sentences = remove_profanity(sentences, language='All')
267+
cleaned_text = '. '.join(cleaned_sentences)
268+
269+
return cleaned_text
270+
271+
def remove_sensitive_info_from_text(text):
272+
"""
273+
Remove sensitive information from the input text.
274+
275+
This can be useful for depersonalization of text data.
276+
277+
Parameters:
278+
- `text` (str): The input text to remove sensitive information from.
279+
280+
Returns:
281+
- `text` (str): The cleaned output text.
282+
"""
283+
sentences = nltk.sent_tokenize(text)
284+
cleaned_sentences = remove_sensitive_information(sentences)
285+
cleaned_text = '. '.join(cleaned_sentences)
286+
287+
return cleaned_text
288+
289+
def remove_hate_speech_from_text(text):
290+
"""
291+
Remove hate speech or offensive speech from the input text.
292+
293+
This function removes sentences, and not just a certain word, because it is context relevant.
294+
295+
Parameters:
296+
- `text` (str): The input text to remove hate speech and offensive speech from.
297+
298+
Returns:
299+
- `text` (str): The cleaned output text.
300+
"""
301+
sentences = nltk.sent_tokenize(text)
302+
cleaned_sentences = []
303+
for sentence in sentences:
304+
outcome = detect_hate_speech(sentence)
305+
if outcome != ['Hate Speech'] and outcome != ['Offensive Speech'] and outcome == ['No Hate and Offensive Speech']:
306+
cleaned_sentences.append(sentence)
307+
cleaned_text = '. '.join(cleaned_sentences)
308+
309+
return cleaned_text

publish steps.txt

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
To publish your Python package using Twine, you'll need to perform a few steps. First, make sure you have the following prerequisites installed:
2+
3+
Setuptools: This is typically used for packaging Python projects.
4+
5+
bash
6+
Copy code
7+
pip install setuptools
8+
Wheel: This is a built-package format that can be installed with pip.
9+
10+
bash
11+
Copy code
12+
pip install wheel
13+
Twine: This is a utility for publishing Python packages on the Python Package Index (PyPI).
14+
15+
bash
16+
Copy code
17+
pip install twine
18+
Once you have these installed, follow these steps to publish your package:
19+
20+
1. Package your Project
21+
Navigate to your project's root directory in the terminal and run the following command to create a source distribution and a wheel distribution:
22+
23+
bash
24+
Copy code
25+
python setup.py sdist bdist_wheel
26+
This command will generate a dist directory containing your packaged project.
27+
28+
2. Create a PyPI Account
29+
Make sure you have an account on the PyPI website. You'll need this account to upload your package.
30+
31+
3. Upload your Package
32+
Use Twine to upload your package to PyPI:
33+
34+
bash
35+
Copy code
36+
twine upload dist/* -u __token__ -p pypi-token
37+
This command uploads all files in the dist directory to PyPI.
38+
39+
4. Enter your PyPI Credentials
40+
Twine will prompt you to enter your PyPI username and password. Enter the credentials associated with your PyPI account.
41+
42+
5. Verify your Package on PyPI
43+
Visit your project's page on PyPI to verify that your package has been successfully uploaded.

readme.md

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# DupliPy 0.2.0
1+
# DupliPy 0.2.1
22
![Python Version](https://img.shields.io/badge/python-3.12-blue.svg)
33
![Code Size](https://img.shields.io/github/languages/code-size/infinitode/duplipy)
44
![Downloads](https://pepy.tech/badge/duplipy)
@@ -7,9 +7,9 @@
77

88
An open source Python library for text formatting, augmentation, and similarity calculation tasks in NLP, the package now also includes additional methods for image augmentation.
99

10-
## Changes to DupliPy 0.2.0
10+
## Changes to DupliPy 0.2.1
1111

12-
DupliPy now includes useful method descriptions in docstrings, allowing anyone to quickly see what a method does and why it is used. DupliPy also now includes a few extra methods in `replication` and `similarity`, including `shuffle_words()` and `jaccard_similarity_score()` .
12+
Duplipy now utilizes another one of our Python packages, called ValX, which provides quick methods we can use to clean and format our text data before training in preprocessing steps.
1313

1414
## Installation
1515

@@ -29,7 +29,7 @@ DupliPy supports the following Python versions:
2929
- Python 3.9
3030
- Python 3.10
3131
- Python 3.11
32-
- Python 3.12
32+
- Python 3.12 or later
3333

3434
Please ensure that you have one of these Python versions installed before using DupliPy. DupliPy may not work as expected on lower versions of Python than the supported.
3535

@@ -40,7 +40,8 @@ Please ensure that you have one of these Python versions installed before using
4040
- Sentiment Analysis: Find impressions within sentences.
4141
- Similarity Calculation: Calculate text similarity using various metrics.
4242
- BLEU Score Calculation: Calculate how well your text-based NLP model performs.
43-
- Image Augmentation Tasks **(NEW)**
43+
- Image Augmentation Tasks.
44+
- Profanity removal, hate speech removal, offensive speech removal, and sensitive information removal.
4445

4546
*For full reference documentation view [DupliPy's official documentation](https://infinitode-docs.gitbook.io/documentation/package-documentation/duplipy-package-documentation).*
4647

@@ -157,6 +158,19 @@ resized_image.save("path/to/resized.jpg")
157158
randomly_cropped_image.save("path/to/randomly_cropped.jpg")
158159
```
159160

161+
### Hate speech and Offensive speech removal using AI
162+
163+
```python
164+
from duplipy.formatting import remove_hate_speech_from_text
165+
166+
text = "I hate all of you bad word! Can't you just bad word leave me alone! Hi, I'm Katy."
167+
168+
print(remove_hate_speech_from_text(text))
169+
170+
### Output
171+
# "Hi, I'm Katy."
172+
```
173+
160174
## Contributing
161175

162176
Contributions are welcome! If you encounter any issues, have suggestions, or want to contribute to DupliPy, please open an issue or submit a pull request on [GitHub](https://github.com/infinitode/duplipy).

setup.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
setup(
44
name='duplipy',
5-
version='0.2.0',
5+
version='{{VERSION_PLACEHOLDER}}',
66
author='Infinitode Pty Ltd',
77
author_email='infinitode.ltd@gmail.com',
88
description='A package for formatting and text replication, with added support for image augmentation.',
@@ -17,6 +17,7 @@
1717
'joblib',
1818
'tqdm',
1919
'pillow',
20+
'valx'
2021
],
2122
classifiers=[
2223
'Development Status :: 5 - Production/Stable',

test.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
from duplipy.formatting import remove_hate_speech_from_text
2+
3+
print(remove_hate_speech_from_text("Hello friend! Goodbye Fag!"))

0 commit comments

Comments
 (0)