![]() ![]() However, how could the script above be improved, or be written cleaner? So far, the script does the job, which is great. ![]() The script is cleaned via cleaner = lambda x: cleaning(x)ĭf = df.apply(cleaner)ĭf = df.replace('', np.nan) Text = ''.join(character for character in text if character not in exclude) Text = ''.join(character for character in text if ord(character) < 128) Url_pattern = re.sub(url_pattern, ' ', text) # remove new line and digits with regular expression Particularly, I'm interested in feedback to the following code: def cleaning(text): It should only have white-space between words and remove all "\n" elements from the text. The clean text would ideally be lowercase words, without numbers and at maybe only commas and a dot at the end of a sentence. My name is Joe.I created following script to clean text that I scraped. Print(" ".join(map(sent_capitalize, sentences))) # split the text into a list of sentences Return " ".join(words) "".join(words) # dot """Capitalize the first word in the *sentence*.""" To accept a larger variaty of texts, you could use nltk package: # $ pip install nltkįrom nltk.tokenize import sent_tokenize, word_tokenize Note: pep8 recommends lowercase names for functions e.g., capitalize_sentence() instead of sentenceCapitalizer(). Lambda m: m.group(1) m.group(2).capitalize(), Or to capitalize the full words (It might make the difference for a Unicode text), you could use regular expressions - re module: #!/usr/bin/env python3 To allow arbitrary whitespace after the dot. capitalize() sentence for sentence in sentences] capitalize() sentence for sentence in sentences]ĭemo: > def sentenceCapitalizer (string1: str): Together that makes your function: def sentenceCapitalizer (string1: str): You'll probably want to use better variable names here your strings are sentences, not words, so your code could do better reflecting that. This'll join the strings in words2 with the '. join() method either that too is a string method: string2 = '. Limit yourself to the first letter only, and then add back the rest of the string unchanged: words2 =. str.capitalize() would lowercase everything else, including the J in Joe: > 'my name is Joe'.capitalize() Use the method on each individual element instead: words2 = īut this would be applying the wrong transformation you don't want to capitalise the whole sentence, but just the first letter. You are trying to use a string method on the wrong object words is list object containing strings. What is that telling me and how do I fix this? I tried following instructions found on a page listed as the python software foundation so I thought I'd have this. Upon execution I get the error: Traceback (most recent call last):įile "C:\Users\Andrew\Desktop\lab3.py", line 83, in įile "C:\Users\Andrew\Desktop\lab3.py", line 79, in sentenceCapitalizerĪttributeError: 'list' object has no attribute 'capitalize'" What I have so far is: def sentenceCapitalizer (string1: str): what is your name?” Assume a sentence is separated by a period followed by a space." ![]() What is your name?” if the argument to the function is “hello. The function returns aĬopy of the string with the first character of each sentence capitalized. Write a function sentenceCapitalizer that has one parameter of type string. This should be easy but somehow I'm not quite getting it. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |