Use Python to calculate the tone of financial texts

I find two internet resources for this task (thank both authors):

The first solution is way more efficient than the second, but the second is more straightforward. The first needs extra knowledge of PostgreSQL and R besides Python. I borrow from the two resources and write the Python code below.

Note: to use the Python code, you have to know how to assign the full text of an article of interest to the variable text, and how to output the total word count and the counts of positive/negative words in text.

In the first part of the code, I read the dictionary or the word list into a Python dictionary variable. The word list used here is supposed to be a .txt file and in the following format:

For accounting and finance research, a commonly used positive/negative word list was developed by Bill McDonald. See his website for download.

In the second part of the code, I create regular expressions that are used to find occurrences of positive/negative words. The last few lines of codes are used to get the counts of positive/negative words in the text.

This entry was posted in Python. Bookmark the permalink.

5 Responses to Use Python to calculate the tone of financial texts

  1. Ian Gow says:

    I agree that my solution is more complex. But in part that’s because it’s a more complete solution. One has to download and process the data from Bill MacDonald (“see his website for download” implies undocumented steps in the process). Then one has to organize and perhaps process the text so it can be fed to the Python function. Finally, one needs to handle the output.

    I think the first step on my site could be done in Python (rather than R … my decision to use R is more a reflection of my comparative advantage in R than anything inherent to Python). And the second step could be done without PostgreSQL (especially if the first step is done in Python). I think a “pure Python” approach would be more elegant than what I have, at least as a code illustration.

    • Kai Chen says:

      Hi Ian, happy to hear your thoughts promptly – I like your blog and really benefit from it.

      I like how you deal with the regular regression pattern. It is very efficient, saving the trouble to use too many loops. In my experiment, your code is about 6 times faster than the other. I agree that your solution is more complete, and that reading texts from and outputting tone counts to a database is a better idea than reading/writing CSV. In my codes, I do bypass the feeding and outputting part in my post.

  2. Mu Civ says:

    Hi Kai, I’m new to Python, so I really appreciate your code!

    Unfortunately, it doesn’t work for me though. Few errors occured:

    #1 NameError: name ‘re’ is not defined -> I added “import re”, which helped I guess

    #2 NameError: name ‘text’ is not defined -> I defined text as text = “Bsp.text” (which is the document I would like to analyse). This also seemed to help, at least the error does not occur anymore.

    #3 NameError: name ‘count’ is not defined -> I really don’t know how to fix this one though… Can you help me please?

    Thanks in advance!

    • Mu Civ says:

      Hi Kai,

      I’ve already solved my problem.

      Here is the last part of the code (if anyone should be interested):

      # Get tone count

      with open(‘Bsp.txt’, ‘r’) as content_file:
      content = content_file.read()

      count = {}
      wordcount = len(content.split())
      for cat in dict.keys():
      count[cat] = len(regex[cat].findall(content))

      print(count)

      Thanks and have a nice day. 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *