Chatting with GPT – Neural Networks

*NOTE: I have manually modified some of the equations produced by ChatGPT, such as adding a t-1 subscript

Hello, please tell me what is ChatGPT?

ChatGPT is a variant of the GPT (Generative Pre-training Transformer) language model that was specifically designed for use in chatbots and conversational AI systems. It is trained on a large dataset of human-human conversations and is able to generate human-like responses to user input. ChatGPT can be used to build chatbots that can hold natural language conversations with users, helping businesses and organizations automate customer service, sales, and other types of interactions.

Like other GPT models, ChatGPT is a transformer-based language model that uses attention mechanisms to process and generate text. It is trained using a combination of supervised learning, in which the model is given labeled examples of conversation, and unsupervised learning, in which the model is trained to predict the next word in a sequence based on the context of the words that come before it. This allows ChatGPT to learn the patterns and structure of human conversation and generate appropriate responses to user input.

I would like to learn more about machine learning, let’s start with the concept of neural networks. Please explain the concept of a neural network.

A neural network is a type of machine learning model that is inspired by the structure and function of the human brain. It is composed of a large number of interconnected “neurons,” which are simple processing units that receive input, transform it, and produce output.

In a neural network, the input data is fed through a series of hidden layers, each of which is made up of a large number of neurons. These hidden layers process the input data using weighted connections between the neurons. The weights of these connections are adjusted during the training process, allowing the neural network to “learn” from the input data.

The output of the neural network is produced by the output layer, which is also made up of a number of neurons. The output layer takes the processed data from the hidden layers and produces the final output of the neural network.

Neural networks can be used for a wide variety of machine learning tasks, including classification, regression, and clustering. They are particularly powerful for tasks that require the processing of large amounts of data or the detection of patterns and relationships in data that are not immediately apparent.

Overall, the concept of a neural network is a key element of machine learning and is used in many different applications and industries.

How does a neural network look in terms of mathematical equations? What is the architecture?

The architecture of a neural network refers to the way that the neurons are organized and connected within the network. A typical neural network consists of an input layer, one or more hidden layers, and an output layer. The input layer receives the raw input data, and the output layer produces the final output of the network. The hidden layers are located between the input and output layers and are responsible for processing the input data and producing intermediate output that is passed on to the output layer.

In terms of mathematical equations, the computation performed by a neural network can be represented by a series of matrix multiplications and element-wise non-linear functions (also called activation functions).

For example, consider a neural network with one hidden layer and an input layer with m inputs and an output layer with n outputs. Let’s say that the input data is represented by a matrix X with dimensions m x N (where N is the number of data points) and the weights between the input layer and the hidden layer are represented by a matrix W1 with dimensions m x p (where p is the number of neurons in the hidden layer). The output of the hidden layer is then given by:

h = f(XW1 + b1)

where f is the activation function and b1 is a bias term.

The weights between the hidden layer and the output layer are represented by a matrix W2 with dimensions p x n, and the final output of the neural network is given by:

y = f(hW2 + b2)

where b2 is a bias term.

This is just a simple example, but it illustrates the basic idea of how a neural network can be represented mathematically. In practice, neural networks can have many more layers and a much larger number of neurons, and the equations can become quite complex. However, the basic idea of using matrix multiplications and non-linear activation functions to transform the input data and produce output remains the same.

What exactly do you mean by activation function? What are some examples of such functions?

An activation function is a mathematical function that is applied to the output of a neuron in a neural network. It determines whether the neuron will “fire” and produce an output, or whether it will remain inactive. Activation functions are a key component of neural networks because they allow the network to learn non-linear relationships in the data.

There are many different activation functions that can be used in neural networks, each with its own properties and characteristics. Some common examples include:

Sigmoid: The sigmoid function is a smooth, S-shaped curve that maps any input value to a value between 0 and 1. It is often used in the output layer of a binary classification neural network because it can produce a probability-like output.

Tanh: The tanh (hyperbolic tangent) function is also a smooth, S-shaped curve, but it maps input values to a range of -1 to 1. It is similar to the sigmoid function, but it allows for negative output values as well.

ReLU (Rectified Linear Unit): The ReLU function is a piecewise linear function that maps negative input values to 0 and positive input values to the same value. It is a simple and efficient activation function that has been widely adopted in many neural network architectures.

Leaky ReLU: The leaky ReLU function is a variant of the ReLU function that allows a small negative slope for negative input values. This can help prevent the “dying ReLU” problem, in which some neurons in the network become inactive and stop producing output.

Softmax: The softmax function is a generalization of the sigmoid function that is often used in the output layer of a neural network for multi-class classification. It maps the input values to a probability distribution over the different classes, such that the sum of the probabilities is 1.

These are just a few examples of the many activation functions that are used in neural networks. The choice of activation function can have a significant impact on the performance of the network, and different activation functions may be better suited for different types of tasks and data.

So how would I choose an activation function then? Would each layer have a different activation function?

The choice of activation function can have a significant impact on the performance of a neural network, and it is an important design decision that can affect the network’s ability to learn from the data and generalize to new examples.

There is no one-size-fits-all activation function that is best for all situations, and it is often necessary to experiment with different activation functions to find the one that works best for a particular task. In general, it can be helpful to consider the properties of the activation function and how they align with the characteristics of the data and the goals of the task.

For example, if you are building a neural network for a binary classification task, you may want to use a sigmoid or a tanh activation function in the output layer, since these functions can produce a probability-like output between 0 and 1. On the other hand, if you are building a neural network for a multi-class classification task, you may want to use a softmax activation function in the output layer, since it can produce a probability distribution over the different classes.

As for the hidden layers of the network, ReLU and its variants (such as Leaky ReLU) are often a good choice, since they are simple and efficient and have been shown to work well in a wide range of tasks. However, other activation functions may also be suitable, depending on the characteristics of the data and the specific goals of the task.

In general, it is possible to use different activation functions in different layers of a neural network, and this can sometimes be helpful to improve the network’s performance. However, it is also important to be mindful of the complexity of the network and to avoid adding unnecessary complexity that could harm the network’s ability to learn effectively.

So once I have my data inputs, my data outputs, and I choose my activation functions, how do I choose the number of layers in my neural network?

The number of layers in a neural network, as well as the number of neurons in each layer, are important design choices that can impact the network’s ability to learn from the data and generalize to new examples. In general, the more layers and neurons a network has, the more capacity it has to learn from the data. However, too many layers and neurons can also make the network more prone to overfitting, which is when the network performs well on the training data but poorly on new, unseen examples.

There is no hard and fast rule for choosing the number of layers and neurons in a neural network, and it is often necessary to experiment with different configurations to find the one that works best for a particular task. However, there are a few guidelines that you can follow to help you make informed decisions:

Start with a small number of layers and neurons and gradually increase them until you see improvements in the network’s performance. This can help you avoid the problem of overfitting, since a smaller network is less likely to overfit the data.
Consider the complexity of the task and the amount of data you have available. For more complex tasks or larger datasets, you may need a deeper network (i.e., one with more layers) to learn effectively. On the other hand, for simpler tasks or smaller datasets, a shallower network may be sufficient.
Experiment with different architectures and compare the performance of the different models. This can help you identify the best-performing configuration for your task.

Ultimately, the choice of the number of layers and neurons in a neural network is a trade-off between model capacity and generalization. You want to choose a configuration that has enough capacity to learn from the data, but not so much that it overfits the data. Finding the right balance can require some experimentation and trial and error, but with the right approach, you can build a neural network that performs well on your task.

Once I will choose the number of layers, the number of neurons in each layers, the activation functions, how do I train the model then? How will the weights be calculated?

To train a neural network, you need to adjust the weights of the connections between the neurons in such a way that the output of the network is as close as possible to the desired output for a given set of inputs. The process of adjusting the weights to minimize the difference between the predicted output and the desired output is known as optimization.

There are several optimization algorithms that can be used to train a neural network, each with its own pros and cons. Some common optimization algorithms include:

Stochastic Gradient Descent (SGD): SGD is a simple and widely-used optimization algorithm that involves updating the weights of the network based on the gradient of the loss function with respect to the weights. The loss function is a measure of how well the network is doing at predicting the desired output given the input data, and the gradient is a vector that points in the direction of the steepest increase in the loss function. SGD involves repeatedly updating the weights in the opposite direction of the gradient, in an attempt to minimize the loss function and improve the performance of the network.
Adam (Adaptive Moment Estimation): Adam is a variant of SGD that uses an adaptive learning rate, which means that the step size for updating the weights is adjusted based on the past gradient information. Adam tends to be more efficient and require fewer hyperparameter tuning compared to SGD.
Mini-batch gradient descent: Mini-batch gradient descent is a variant of SGD that involves dividing the training data into small batches and updating the weights based on the average of the gradients of the loss function for each batch. This can be more efficient than SGD, especially for large datasets, since it allows the network to learn from the data more quickly.
RProp (Resilient Backpropagation): RProp is a variant of SGD that uses a heuristic approach to adjust the learning rate for each weight individually, based on the past gradient information. It is generally faster and more efficient than SGD, and it is often used in large-scale neural network training.

Please explain stochastic gradient descent further, include equations

Stochastic Gradient Descent (SGD) is a widely-used optimization algorithm for training neural networks and other machine learning models. It involves updating the weights of the model based on the gradient of the loss function with respect to the weights. The loss function is a measure of how well the model is doing at predicting the desired output given the input data, and the gradient is a vector that points in the direction of the steepest increase in the loss function.

SGD works by iteratively updating the weights of the model in the opposite direction of the gradient, in an attempt to minimize the loss function and improve the performance of the model. The update rule for SGD can be expressed as follows:

w = w – alpha * gradient

where w is the weight of the model, alpha is the learning rate (a hyperparameter that controls the step size for updating the weights), and gradient is the gradient of the loss function with respect to the weight.

The learning rate is an important hyperparameter in SGD, and it determines the step size for updating the weights. A larger learning rate can lead to faster convergence, but it can also make the optimization process more unstable. On the other hand, a smaller learning rate can lead to slower convergence, but it can also make the optimization process more stable.

SGD can be implemented in a number of ways, and it is often used in conjunction with other techniques, such as mini-batch gradient descent or momentum, to improve the efficiency and stability of the optimization process.

You also mentioned ADAM – Adaptive Moment Estimation, how does this algorithm differ from SGD? Please provide equations.

Adam (Adaptive Moment Estimation) is a variant of Stochastic Gradient Descent (SGD) that uses an adaptive learning rate, which means that the step size for updating the weights is adjusted based on the past gradient information. Adam is a popular optimization algorithm that is often used to train neural networks and other machine learning models.

One of the key differences between Adam and SGD is that Adam uses an exponentially weighted average of the past gradients to scale the learning rate, rather than a fixed learning rate as in SGD. This allows Adam to automatically adjust the learning rate based on the characteristics of the data and the optimization process, which can lead to faster convergence and better performance.

The update rule for Adam can be expressed as follows:

m = beta1 * m_t-1 + (1 – beta1) * gradient

v = beta2 * v_t-1 + (1 – beta2) * gradient^2

w = w_t-1 – alpha * m / (sqrt(v) + epsilon)

where w is the weight of the model, alpha is the learning rate (a hyperparameter that controls the step size for updating the weights), m and v are the exponentially weighted averages of the past gradients and squared gradients, respectively, beta1 and beta2 are hyperparameters that control the decay rate of the moving averages, and epsilon is a small value that is used to prevent division by zero.

Adam uses two moving averages, one for the gradients and one for the squared gradients, which allows it to incorporate both the mean and the variance of the past gradients into the learning rate. This can help to stabilize the optimization process and prevent oscillations or divergences.

You also listed RProp (Resilient Backpropagation) as an optimization algorithms, please explain RProp with equations.

Resilient backpropagation (RProp) is an optimization algorithm used in artificial neural networks for training supervised learning models. It is an extension of the backpropagation algorithm that aims to improve the convergence speed of the training process by adapting the learning rate for each weight in the network during training.

Like backpropagation, RProp uses the gradient of the loss function to update the weights of the network. The gradient is calculated using the chain rule and the weights are updated using the following equation:

weight(t+1) = weight(t) – learning_rate * gradient(t)

In backpropagation, the learning rate is a fixed hyperparameter that is set before training. In RProp, the learning rate is not fixed and is instead adapted for each weight based on the sign of the gradient. In RProp, the learning rate for each weight in the network is updated individually, based on the sign of the gradient of the loss function with respect to that weight. This means that the learning rate for each weight is updated individually based on the convergence of that weight, rather than being updated based on the convergence of all weights in the network. This can improve the convergence speed of the training process, as the learning rate can be adjusted according to the convergence of each weight.

The learning rate is updated using the following equation:

learning_rate(i)_t+1 = learning_rate(i)_t – step_size(i)_t-1 * sng (dE_t-1 / dw(i)_t-1 )

I have also often heard the AdaGrad optimizer mentioned, please explain that to me

In AdaGrad, the learning rate is not fixed and is instead adapted for each weight based on the history of the gradients of the loss function with respect to that weight. Specifically, the learning rate is adjusted based on the sum of the squares of the gradients of the loss function with respect to the weight.

The learning rate for each weight is updated using the following equation:

learning_rate(t+1) = learning_rate(t) / sqrt(sum(gradient(t)^2))

where sum(gradient(t)^2) is the sum of the squares of the gradients of the loss function with respect to the weight up to time t.

This means that the learning rate for each weight is inversely proportional to the sum of the squares of the gradients of the loss function with respect to that weight. This has the effect of decreasing the learning rate for weights that have had large gradients in the past, and increasing the learning rate for weights that have had small gradients in the past.

I have also heard of gradient boosting, explain that

Gradient boosting is a machine learning technique for building predictive models by creating an ensemble of weak learners and combining their predictions to create a stronger, more accurate model. The general idea of gradient boosting is to train a sequence of weak learners (e.g. decision trees) in a stage-wise manner, with each subsequent learner trying to correct the errors made by the previous learner. At each stage, the gradient boosting algorithm fits a new weak learner to the residual errors made by the previous learner. The residual errors are the difference between the true values and the predicted values of the target variable.

In gradient boosting, weights are assigned to the incorrectly classified samples in order to give more importance to the samples that are difficult to classify. This helps the weak learners in the ensemble to focus more on the samples that are most likely to improve the overall performance of the model.

The steps of the gradient boosting algorithm can be summarized in the following equations:

Initialize the ensemble with a single weak learner, such as a decision tree, and make predictions using this learner:

prediction(1) = weak_learner_prediction(1)

Calculate the residual errors between the true values and the predicted values of the target variable:

residual_error(t) = true_value(t) – prediction(t)

Train a new weak learner to predict the residual errors from the previous step:

weak_learner_prediction(t+1) = train(weak_learner, residual_error(t))

Update the predictions of the ensemble by adding the predictions of the new weak learner to the previous predictions:

prediction(t+1) = prediction(t) + weak_learner_prediction(t+1)

Repeat steps 2-4 until the desired number of weak learners has been trained.

What is AdaBoost and what are the steps of the algorithm? Include equations.

AdaBoost works by weighting the observations in the training set and adjusting the weight of each observation at each iteration of the algorithm based on the performance of the previous models. The idea behind AdaBoost is to train a sequence of weak learners in an iterative manner, with each learner trying to correct the mistakes of the previous learner. The final model is a weighted average of all the weak models.

Here are the steps of the AdaBoost algorithm:

Initialize the weights of the observations in the training set:
- At the beginning, all the observations in the training set are given the same weight.
Train a weak learner:
- Train a weak learner (e.g., a decision tree) on the training set using the current weights of the observations (My note: sample with replacement).
Calculate the error of the weak learner:
- Calculate the error (my note: total error) on the training set using the following equation:error = sum(weights[i] * I[prediction[i] != y[i]])
where I is the indicator function that returns 1 if the prediction is incorrect and 0 if the prediction is correct, weights is the array of weights for the observations, prediction is the array of predictions made by the weak learner, and y is the array of true labels for the observations.
Calculate the weight of the weak learner:
- Calculate the weight of the weak learner using the following equation :
  alpha = 0.5 * log((1 – total error) / total error)
Adjust the weights of the observations:
- Adjust the weights of the observations using the following equation:
  weights[i+1] = weights[i] * exp(-alpha * I[prediction[i] == y[i]])
Normalize the weights:
- Normalize the weights so that they sum up to 1.
Repeat steps 2-6:
- Repeat steps 2-6 a predetermined number of times or until the error of the model becomes small enough.
Return the final model:
- Return the final model as a weighted average of all the weak models.

My note:

Short note on exp(-alpha * I[prediction[i] == y[i]]) i.e. for misclassification

When the performance is relatively large the last stump did a good job in classifying the records now the new sample weight will be much larger than the old one. When the performance is relatively low the last stump did not do a good job in classifying the records now the new sample weight will only be little larger than the old one.

No misclassification: When the performance is relatively large the last stump did a good job in classifying the records now the new sample weight will be very small than the old one. When the performance is relatively small the last stump did not do a good job in classifying the records now the new sample weight will only be little smaller than the old one.

What is the difference between gradient boosting algorithm and adaptive boosting?

Gradient boosting and adaptive boosting are both ensemble learning algorithms that can be used to improve the performance of a weak learner (a model that performs only slightly better than random guessing) by combining the predictions of multiple weak learners.

The main difference between gradient boosting and adaptive boosting is the way in which the weak learners are trained and combined.

In gradient boosting, the weak learners are trained sequentially, with each learner trying to correct the mistakes of the previous learner. In adaptive boosting (also known as AdaBoost), the weak learners are also trained sequentially, but the training process is different. Each weak learner is trained on a different subset of the training data, and the weights of the training examples are adjusted after each learner is trained to give more weight to the examples that were misclassified. The final prediction is made by combining the predictions of all of the learners using a weighted sum, where the weight of each learner is determined by its accuracy on the training data.

Off-topic, using NLP to summarize Russian propaganda articles on the invasion of Ukraine

I have a pretty simple article summarizer in python that I use for my blog. It’s not machine learning, it’s pretty basic – collecting N articles based on a search query, cleaning up the text, tokenizing, removing stop words. Then word frequencies are calculated and each sentence is scored based on those frequencies. Top N sentences are selected for the summary.

I usually use it to summarize articles on some specific supplements, but today for fun I decided to try a different search query. My query was “путин россия украина спецоперация”. I think that even though this summarizer is pretty simple, the summary turned out to be pretty meaningful. It gives a pretty good overview of what sort of info you would get if you were reading Russian news. I selected the top 15 sentences for the summary.

Что касается слов президента о спецоперации на Донбассе, то они показывают, что наши военнослужащие снова спасают мир от коричневой чумы нацизма, и все поставленные задачи, вне всякого сомнения, будут выполнены. Верховный главнокомандующий еще раз подтвердил, спецоперация на Украине будет доведена до своего логического завершения по тем задачам, которые он озвучил 24 февраля, в день начала СВО», — сказал полковник. И президент в своей речи указал на союзников из Европы, Азии, Африки и Латинской Америки, которые не прогибаются перед так называемым гегемоном, выбирают суверенный путь развития и хотят коллективно решать вопросы безопасности и сформировать многополярный мир.

Украинские издания утверждают, что СНБО одобрил санкции против патриарха Кирилла, а также Сергея Кириенко, Евгения Пригожина, Романа Абрамовича, Олега Дерипаски, Михаила Фридмана, Виктора Медведчука, а также Виктора Януковича. Наш долг перед памятью миллионов жертв Второй мировой войны — жестко реагировать на попытки фальсификации истории, противодействовать распространению любых форм неонацизма, русофобии и расизма», — призвал президент. Все указывает на то, что временной, человеческий, материальный, дипломатический ресурс «спецоперации» близится к исчерпанию, и Путин делает решительный шаг, чтобы закончить как можно скорее, зафиксировав прибыли и убытки. “Наша задача, наша миссия, солдат, ополченцев Донбасса – эту войну прекратить, защитить людей и, конечно, защитить саму Россию”, – подчеркнул Путин.

Выступление Путина демонстрирует готовность Вооруженных сил России довести специальную военную операцию на Украине до победного конца, пояснил URA.RU военный эксперт, ветеран разведки, полковник Анатолий Матвийчук. Война, развязанная Западом и киевской хунтой, будет закончена», — заключил Красов. Российское высокоточное оружие опережает иностранные аналоги на годы и десятилетия, при этом значительно превосходит их по тактико-техническим характеристикам, а оборонная промышленность лежит в основе суверенитета России, отметил президент.

Говоря о шагах со стороны Запада, ранее председатель комиссии Госдумы по расследованию фактов вмешательства иностранных государств во внутренние дела России Василий Пискарев заявлял о том, что зарубежные НКО насаждают радикальные идеологии в российском обществе, сообщает РАПСИ. «Сперва коллективный Запад изощрялся доказать, что якобы „разорвал в клочья“ российскую экономику, теперь там думают, что изолировали нашу страну от всего остального мира. Вооруженные силы России надежно защищают свою страну и несут свободу другим народам, заявил Путин, открывая работу форума «Армия-2022» и Армейских международных игр — 2022. «У России в запасе есть несколько военных вариантов, и для США и их союзников по НАТО эти сценарии — серьезный повод для беспокойства», — пишет журнал. Несмотря на то, что Запад был воодушевлен последними «успехами» ВСУ, генерал США Марк Милли призвал опасаться непредсказуемости России.

Yerba Mate (Ilex Paraguariensis) articles summary using NLP

The following summary was created using a google search for specific phrases and then performing natural language processing steps for sentence scoring. Yerba mate is an evergreen tree/shrub that grows in subtropical regions of South America. The leaves of the plant are used to make tea. Yerba mate tea contains caffeine and theobromine, which are known to affect the mood. I was interested in summarizing the existing articles in regards to research on this plant in psychiatry.

The first search phrase used was “yerba mate psychiatry depression research evidence“, and the number of collected articles for this phrase was 18. The text from all articles was combined, and relative word frequencies were calculated (after removing stop-words). These relative frequencies were then used to score each sentence. Sentence length distribution was checked, and the 90th percentile of 30 words was chosen to select sentences below the maximum length. Below are the 10 highest scoring sentences that summarize the text from the 18 articles.

We can infer from the summary that studies have been performed using the yerba mate extract on rats and tasks for chosen as proxies for the rats’ depression and anxiety levels. There are no mentions of human studies in the summary. Also the chosen sentences indicate that based on these studies, yerba mate has potential antidepressant activity, and it may improve memory as well. The results of the anxiety study were not mentioned and it’s not clear whether there were any side effects from yerba mate. These results are in line with descriptions of personal experiences of reddit users that I have reviewed, as many report better mood and improved focus after drinking yerba mate tea. Some users do report increased anxiety correlated with yerba mate consumption.

View abstract. J Agric.Food Chem. Vitamin C Levels Cerebral vitamin C (ascorbic acid (AA)) levels were determined as described by Jacques-Silva et al. Conclusion: In conclusion, the present study showed that Ilex paraguariensis presents an important effect on reducing immobility time on forced swimming test which could suggest an antidepressant-like effect of this extract. Despite previous some studies show the antidepressant-like activity of flavonoids [31, 32] which are present in the extract of I. paraguariensis, any study has evaluated the possible antidepressant-like activity of it. The presence of nine antioxidants compounds was investigated, namely, gallic acid, chlorogenic acid, caffeic acid, catechin, quercetin, rutin, kaempferol, caffeine, and theobromine. Abstract In this study, we investigated the possible antidepressant-like effect of I. paraguariensis in rats. Another study showed that an infusion of I. paraguariensis can improve the memory of rats treated with haloperidol and this effect was related to an indirect modulation of oxidative stress . In addition to flavonoids as quercetin and rutin and phenolic compounds as chlorogenic and caffeic acids, yerba mate is also rich in caffeine and saponins . After four weeks, behavioral analysis of locomotor activity and anxiety was evaluated in animals receiving water (n = 11) or I. paraguariensis (n = 9). In the same way, we evaluated if the presence of stimulants compounds like caffeine and theobromine in the extract of I. paraguariensis could cause anxiety. In the present study, we evaluated the possible antidepressant-like effect of I. paraguariensis by using forced swimming test (FST) in rats. Forced Swimming Test This experiment was performed using the FST according to the method previously published by Porsolt et al. In this context, Yerba mate (Ilex paraguariensis) is a beverage commonly consumed in South America especially in Argentina, Brazil, Uruguay, and Paraguay. I. paraguariensis reduced the immobility time on forced swimming test without significant changes in locomotor activity in the open field test.

I also tried several other search phrases, such as “yerba mate mood anxiety evidence” and “yerba mate side effects evidence“. In total of 17 articles were collected for the first query and 19 articles for the second query. The summaries are presented below. There was nothing in the summary directly discussing mood or anxiety, but there are mentions of neuroprotective effects and antioxidant effects. We can also learn that a cup of yerba mate tea has similar caffeine content as a cup of coffee, and that drinking yerba mate is not recommended while pregnant or breastfeeding. As in the previous summary, no human trials were mentioned, so it seems that all the summarized studies were performed on rats. The side effects query summary mentions the risk of transferring the caffeine from the tea to the fetus when pregnant, as well as a link to cancer for those who drink both alcohol and yerba mate. It also mentions and anxiety is a side effect of the tea.

Query 1:
View abstract. J Agric.Food Chem. On the other hand, studies conducted on an animal model showed chemopreventive effects of both pure mate saponin fraction and Yerba Mate tea in chemically induced colitis in rats. Yerba Mate Nutrition Facts The following nutrition information is provided by the USDA for one cup (12g) of a branded yerba mate beverage (Mate Revolution) that lists just organic yerba mate as an ingredient. Researchers found that steeping yerba mate (such as in yerba mate tea) may increase the level of absorption. Yerba mate beverages are not recommended for children and women who are pregnant or breastfeeding. Chlorogenic acid and theobromine tested individually also had neuroprotective effects, but slightly weaker than Yerba Mate extract as a whole, but stronger than known neuroprotective compounds, such as caffeine [ 83 ]. The caffeine content in a cup (about 150 mL) of Yerba Mate tea is comparable to that in a cup of coffee and is about 80 mg [ 1 , 11 , 20 ]. In aqueous and alcoholic extracts from green and roasted Yerba Mate, the presence of chlorogenic acid (caffeoylquinic acid), caffeic acid, quinic acid, dicaffeoylquinic acid, and feruloylquinic acid was confirmed. After consumption of Yerba Mate tea, antioxidant compounds are absorbed and appear in the circulating plasma where they exert antioxidant effects [ 55 ]. According to the cited studies, Yerba Mate tea consumption attenuates oxidative stress in patients with type 2 diabetes, which may prevent its complications.

Query 2:
View abstract. J Agric.Food Chem. Because yerba mate has a high concentration of caffeine, drinking mate tea while pregnant can increase the risk of transferring caffeine to the fetus. J Ethnopharmacol. South Med J 1988;81:1092-4.. View abstract. J Am Coll Nutr 2000;19:591-600.. View abstract. Am J Med 2005;118:998-1003.. View abstract. J Psychosom Res 2003;54:191-8.. View abstract. Yerba mate consumed by those who drink alcohol is linked to a higher risk of developing cancer. Anxiety and nervousness are a side effect of excessive yerba mate tea consumption.

NLP: Summarizing l-theanine articles

In this post I will describe my use of NLP (Natural language processing, not neuro-linguistic programming. Natural language processing is cool, while neuro-linguistic programming is some pseudoscience stuff) in the application of summarizing articles from the internet. Specifically, I chose the topic of l-theanine and psychiatry, as previously I have already summarized the Nootropics subreddit discussions on l-theanine. The next step, therefore, is to summarize existing articles on this topic.

Summarizing experience with green tea from the Nootropics subreddit

The first step was to perform an automated Google search for a specific term. I chose the term “l-theanine psychiatry” and set the number of unique urls to be 15. Some of the resulting urls are listed below:

Can L-Theanine Help Treat Symptoms of Bipolar Disorder?

Effects of L-Theanine Administration on Stress-Related Symptoms and Cognitive Functions in Healthy Adults: A Randomized Controlled Trial

L-theanine

How does the tea L-theanine buffer stress and anxiety

It can be seen that the article titles are quite relevant to our topic. The next step is formatting the text and summarizing the information.

The idea behind the summarization technique is calculating word frequencies for each word in the combined text of all articles (after stop words removal), and then selecting words in the top 10% of frequencies. These words will be the ones used in scoring each sentence. More frequent words will be given more importance, as they are deemed more relevant to the chosen topic, therefore sentences containing those words will receive higher scores. This is not a machine learning approach, but a basic frequency count method. In total, 148 words were used for sentence scoring. Some of the most frequent words (from all articles combined) are listed below:

Theanine, administration, effects, placebo, weeks, study, four, sleep, scores, cognitive, may, stress, function, fluency, studies, related, symptoms, participants, bacs, anxiety

BACS was one of the top frequent words, it stands for the Brief Assessment of Cognition in Schizophrenia. Once each sentence was scores, 15 highest scoring sentences were selected in order to create a summary. The summary of the articles is presented below. From the summary we can infer that l-theanine was studied for its effects on cognition, anxiety, and stress. Some studies had positive results, indicating that l-theanine performed significantly better than placebo in regards to positive cognitive effects such as improved verbal fluency and executive function. Studies also noted significant improvements in stress reduction with the use of l-theanine. Other studies did not find any significant differences between l-theanine and placebo.

Second, only about 20% of symptoms (the PSQI subscales) and cognitive functions (the BACS verbal fluency, especially letter fluency and executive function) scores showed significant changes after L- theanine administration compared to the placebo administration, suggesting that the effects are not large on daily function of the participants.

Although psychotropic effects were observed in the current study, four weeks L-theanine administration had no significant effect on cortisol or immunoglobulin A levels in the saliva or serum, which was inconsistent with previous studies reporting that salivary cortisol [34] and immunoglobulin A [33] levels were reduced after acute L-theanine administration.

Considering the comparison to the placebo administration, the current study suggests that the score for the BACS verbal fluency, especially letter fluency, but not the Trail Making Test, Stroop test, or other BACS parameters, significantly changes in response to the 4 weeks effects of L-theanine.

The BACS verbal fluency, especially letter fluency (p = 0.001), and executive function scores were significantly increased after L-theanine administration (p = 0.001 and 0.031, respectively; ), while the Trail Making Test A and B scores were significantly improved after placebo administration (p = 0.042 and 0.038, respectively).

When score reductions in the stress-related symptoms were compared between L-theanine and placebo administrations, changes in the PSQI sleep latency, sleep disturbance, and use of sleep medication subscales were significantly greater (p = 0.0499, 0.046, and 0.047, respectively), while those in the SDS and PSQI scores showed a non-statistically significant trend towards greater improvement (p = 0.084 and 0.073, respectively), during the L-theanine period compared to placebo.

Stratified analyses revealed that scores for verbal fluency (p = 0.002), especially letter fluency (p = 0.002), increased after L-theanine administration, compared to the placebo administration, in individuals who were sub-grouped into the lower half by the median split based on the mean pretreatment scores.

Discussion In this placebo-controlled study, stress-related symptoms assessed with SDS, STAI-T, and PSQI scores decreased, while BACS verbal fluency and executive function scores improved following four weeks L-theanine administration.

The present study aimed to examine the effects of four weeks L-theanine administration (200 mg/day, four weeks) in a healthy population, i.e., individuals without any major psychiatric disorder.

The PSQI subscale scores for sleep latency, sleep disturbance, and use of sleep medication reduced after L-theanine administration, compared to the placebo administration (all p < 0.05).

The effects on stress-related symptoms were broad among the symptom indices presented in the study, although a comparison to the placebo administration somewhat limits the efficacy of L-theanine administration for some sleep disturbance measurements.

For cognitive functions, BACS verbal fluency and executive function scores improved after four weeks L-theanine administration.

PMID: 31623400 This randomized, placebo-controlled, crossover, and double-blind trial aimed to examine the possible effects of four weeks L-theanine administration on stress-related symptoms and cognitive functions in healthy adults.

The anti-stress effects of L-theanine (200 mg/day) have been observed following once- [ 33 , 34 ] and twice daily [ 35 ] administration, while its attention-improving effects have been observed in response to treatment of 100 mg/day on four separate days [ 36 ] and 200 mg/day single administration [ 37 ], which was further supported by decreased responses in functional magnetic resonance imaging [ 38 ].

These results suggest that four weeks L-theanine administration has positive effects on stress-related symptoms and cognitive function in a healthy population.

Reddit Depression Regimens cont’d

Previous posts on the topic of scraping reddit data from the depressionregiments subreddit:

Reddit Depression Regimens – Topic Modeling

Reddit Depression Regimens – Topic Modeling cont’d

Next we will create some plots with javascript. For example, it would be interesting to see how often specific psychotropic medications and supplements are mentioned in the text data.
Below is a chart with frequencies of the most common antidepressant medications. The counts were performed by combining the frequencies of the brand name and the chemical name (for example Wellbutrin count is wellbutrin (54) + bupropion (27) = 81).

The data was generated using python and exported as a .csv file, with columns ‘term’ and ‘freq’.

HTML part:

<html>
<head>
  https://cdn.plot.ly/plotly-2.0.0.min.js
  https://d3js.org/d3.v5.min.js
  https://cdn.jsdelivr.net/npm/chart.js@2.9.3
  http://script1.js
</head>
<body onload="draw()">
chart 1
<div id="jsdiv" style="border:solid 1px red"></div>
chart 2
<canvas id="chart"></canvas>
</body>

JS part:

function makeChart(meds) {
  // meds is an array of objects where each object is something like

  var hist_labels = meds.map(function(d) {
    return d.term;
  });
  var hist_counts = meds.map(function(d) {
    return +d.freq;
  });

  arrayOfObj = hist_labels.map(function(d, i) {
      return {
        label: d,
        data: hist_counts[i] || 0
      };
    });
  sortedArrayOfObj = arrayOfObj.sort(function(a, b) {
      return b.data - a.data;
    });

   newArrayLabel = [];
   newArrayData = [];
   sortedArrayOfObj.forEach(function(d){
      newArrayLabel.push(d.label);
      newArrayData.push(d.data);
    });


  var chart = new Chart('chart', {
    type: "horizontalBar",
    options: {
      maintainAspectRatio: false,
      legend: {
        display: false
      }
    },
    data: {
      labels: newArrayLabel,
      datasets: [
        {
          data: newArrayData,
          backgroundColor: "#33AEEF"
        }]
    },
    options: {
      scales: {
        yAxes: [{
          scaleLabel: {
            display: true,
            labelString: 'med name'
          }
        }],
        xAxes: [{
            scaleLabel: {
                display: true,
                labelString: 'freq'
            }
        }],
      },
      legend: {
          display: false
      },
      title: {
          display: true,
          text: 'Frequencies of common antidepressants'
        }
    }    
  });
}

// Request data using D3
d3
  .csv("med_list_counts_df.csv")
  .then(makeChart);

We can generate charts with other medication/supplement lists using the same code. Below is a plot with frequencies of common antipsychotics. As you can see, antipsychotics are not mentioned that frequently as antidepressants, and a lot of names in the input list were not mentioned at all (such as haldol or thorazine), and therefore they do not show up in the chart.

Other medications and common supplements mentioned:

Reddit Depression Regimens – Topic Modeling cont’d

In the previous posts we applied LDA topic modeling to text documents from data collected from the subreddit depressionregimens. Here I will continue with the results from the derived topics model – obtaining the most representative text for each topic. As was stated, the chosen model has ten topics, and LDA assumes that each document is composed of multiple topics, with each topic being assigned a probability. Each topic is composed of multiple words, with each word assigned a probability.

Previous post: Reddit Depression Regiments – Topic Modeling

Since each document is composed of multiple topics, for each topic we can find a document with the highest probability for that topic, therefore that will be our most representative document.

Topic 1

(‘feel’, 0.040), (‘year’, 0.026), (‘thing’, 0.022), (‘symptom’, 0.020), (‘brain’, 0.019), (‘start’, 0.018), (‘time’, 0.017), (‘make’, 0.015), (‘issue’, 0.015), (‘lot’, 0.014)

Most representative post id with topic 1 probability of 0.45:
Full text here: https://www.reddit.com/r/depressionregimens/comments/gib17h

“Blank Mind Syndrome” – Sub group of specific symptoms including: – Loss of Internal Monologue, lack of coherent automatic thoughts, no track of time passage, lack of self insight – Depersonalisation/Derealization Feeling detached, having no “sense of self”, missing mental features, having no emotional autobiography, feeling as if every day is the same, loss of relationship or emotional attachments, feeling detached from external reality – Cognitive Decline, Loss of Visual imagination, inability to think in a deep or complex way, inability to hold information, loss of past learned skills and knowledge. – Complete Lack of goal-directed motivation, having no automatic self direction, no long term goals – Anhedonia – inability to enjoy or derive pleasure, nothing to look forward to, no bodily joy, satasfaction and so on – Lack of atmosphere/deepness of the outside reality, inability to appreciate beauty, things look flat and insignificant. All symptoms in various severity of course, It’s time to talk, what is this condition exactly, Did you suffer from depression your entire life? Is this episodic? how are you planning to solve it? how did you acquire it? had any professional been aware of it? Is it medication induced? Is there any outside outlet acknowledging this specific phenomena? How much time do you suffer from it? What were you diagnosed with? Was it sudden or progressively? Had anything helped at all? Would you join a group for people suffering the same condition? Is anyone interested in doing so? Please do respond!

Topic 2

people 0.044, depression 0.037, doctor 0.028, psychiatrist 0.020, make 0.020, bad 0.016, therapy 0.016, therapist 0.015, find 0.014, problem 0.013

Most representative post for this topic, with probability for topic 2 of 0.53: https://www .reddit.com/r/depressionregimens/comments/iij4tr

I talked to him today, he says all my problems are my choice and I choose to be lazy, suicidal, depressed etc. Is he right?,Dude… if he believes this then he must also believe that his career is total quackery. Get a new psychiatrist immediately. What a piece of shit.,absolutely not, please get a new psychiatrist!! you don’t choose to be suicidal or depressed, and in my experience, depression causes laziness more often than not. it’s worrisome that a professional outright said this to you and honestly I would report him if you can. that’s such a harmful thing to say to anyone suffering from such issues and to say it to the wrong person could be really catastrophic. i’m sorry he’s a dick to you, don’t listen to his bullshit. if it was so easy as to choose not to be depressed then nobody would fucking be depressed. it’s like he thinks people enjoy feeling this way ugh,OMG please please PLEASE never go back there. I once had a psychiatrist tell me I was gonna end up on a street corner with a sign (spoiler alert: I have a career and own a house). I got up and left and never looked back. Remember that YOU are a huge part of your mental health journey. It’s a collaborative effort between you, your psychiatrist, therapist (if you have one), and any other professional you choose to involve. You always have a say, and if something doesn’t seem right, you don’t have to go along with it. Your feelings are ALWAYS valid—don’t ever let anyone tell you differently. You are not alone in this. So many of us are depressed, anxious, suicidal, attention deficit, bipolar, lazy…these are NOT choices. Who would choose to be this way? There are plenty of helpful professionals out there, just make sure you screen them carefully. I believe in you and wish you well!!! …

Topic 3

day 0.037, thing 0.035, feel 0.033, make 0.024, find 0.017, good 0.016, exercise 0.016, eat 0.013, walk 0.013, lot 0.013

https://www.reddit.com/r/depressionregimens/comments/dztdw9

Topic probability: 0.53

Wanted to share something that I’ve recently found to help when I’m struggling to find motivation to complete basic chores. This one specifically deals with laundry, but it can apply to other tasks as well. If you’re like me, you can have laundry sitting there for weeks not being put away. The mountain of clothing is so overwhelming that I just ignore it all together. I’m also an all-or-nothing person; I just wait and wait until a good day when I’ll have enough energy to get it done. Those days are exceedingly rare, so that mountain of clothes will sit there for a loooong time, stressing me out and depressing me even more. I’m trying to switch my mindset to not feeling like I need to take on such giant tasks all at once. I decided to break up the tasks into smaller ones. For the mixed load of laundry that needed to be put away, I told myself I only need to put away the socks and underwear today. Then tomorrow I put away the shirts. The next day, fold pants, and the next everything else that goes on hangers. These smaller tasks only take like 5-10 minutes each, and it’s satisfying to see the pile of clothes dwindle every day versus sit there ominously for several weeks. If you’re feeling overwhelmed, break up your tasks into very small, easily attainable goals. Go easy on yourself and do what you can for the day. Even just the tiniest amount of progress is a good thing.,great advice. Anytime you get anxiety over a task or a situation seems to complex or overwhelming. Just break in down into manageable pieces. Doing SOMETHING is always better than nothing even if it seems like too little or not enough or w/e.,I saw a meme about ‘anything worth doing is worth doing badly’ that addresses this. I try and remember that some days. Us perfectionists want to always do 100%. But in a lot of things (not everything, obviously, just as a general rule) doing 50% of the job, or 90% of the job, is way better then the 0% of the job we do because of that crippling dedication to doing 100%. Not an excuse for doing bad jobs on the stuff that really matters, but can be a much healthier way to approach doing general day-to-day stuff…

Topic 4

ssris 0.027, antidepressant 0.024, effect 0.024, drug 0.022, side_effect 0.020, depression 0.019, serotonin 0.016, prescribe 0.014, treat 0.013, ssri 0.012

Reddit post: https://www.reddit.com/r/depressionregimens/comments/bheg7d

Topic probability: 0.64

Hey y’all, this is a repost of the stickied post made by /u/jugglerofworlds, who appears to have deleted their account and their post along with it. I’ve edited it a little and will continue to keep it updated as needed. Suggestions are welcome. As the former post was, I’m trying to keep this confined to prescription medications, and not natural/herbal remedies (though I recognize that they definitely can be helpful means of treatment). I’m also typically avoiding medications that have been withdrawn from the market and thus aren’t really prescribed. In a future revision of this post I hope to add an additional column featuring which medications are available where, as some of these are approved in European countries but not in the U.S., and vice versa. # Icon key * ✔️ = approved to treat condition by a regulatory agency (FDA, EMA, ANSM, etc) * ➕ = approved as an adjunct treatment by a regulatory agency, to be used in combination with other medications to treat a condition (may or may not be used off-label as a monotherapy) * 🏷️ = Off label use; widely prescribed for condition but not necessarily rigorously studied for it * ⚠️ = experimental medication; in FDA Phase III trials or pending approval # Selective Serotonin Reuptake Inhibitors (SSRIs) |Generic name|Brand name(s)|Treats depression|Treats anxiety| |:-|:-|:-|:-| |citalopram|Celexa|✔️|🏷️| |escitalopram|Lexapro|✔️|✔️| |fluoxetine|Prozac|✔️|✔️| |fluvoxamine|Luvox/Luvox CR|✔️|✔️| |paroxetine|Paxil/Paxil CR|✔️|✔️| |sertraline|Zoloft|✔️|✔️| # Serotonin Modulator and Stimulators (SMS) |Generic name|Brand name(s)|Treats depression|Treats anxiety| |:-|:-|:-|:-| |vortioxetine|Trintellix|✔️|🏷️| |vilazodone|Viibryd|✔️|🏷️| # Serotonin-Norepinephrine Reuptake Inhibitors (SNRIs) |Generic name|Brand name(s)|Treats depression|Treats anxiety| |:-|:-|:-|:-| |venlafaxine|Effexor/Effexor XR|✔️|✔️| |desvenlafaxine|Pristiq|✔️|🏷️| |duloxetine|Cymbalta|✔️|✔️| |milnacipran|Savella|✔️|✔️| |levomilnacipran|Fetzima|✔️|🏷️| |atomoxetine|Strattera|⚠️|⚠️| # Tricyclics (TCAs) ## TCAs with a preference for serotonin |Generic name|Brand name(s)|Treats depression|Treats anxiety|…

Topic 5

treatment 0.035, ketamine 0.028, year 0.022, work 0.021, drug 0.017, hope 0.015, hear 0.012, lithium 0.011, people 0.010, infusion 0.009

Reddit post: https://www.reddit.com/r/depressionregimens/comments/axtnj8

Topic probability: 0.58

https://www.washingtonpost.com/health/2019/03/06/biggest-advance-depression-years-fda-approves-novel-treatment-hardest-cases The Food and Drug Administration approved a novel antidepressant late Tuesday for people with depression that does not respond to other treatments — the first in decades to work in a completely new way in the brain. The drug, a nasal spray called esketamine, has been eagerly anticipated by psychiatrists and patient groups as a powerful new tool to fight intractable depression. The spray acts within hours, rather than weeks or months as is typical for current antidepressants, and could offer a lifeline to about 5 million people in the United States with major depressive disorder who haven’t been helped by current treatments. That accounts for about one in three people with depression. “This is undeniably a major advance,” said Jeffrey Lieberman, a Columbia University psychiatrist. But he cautioned much is still unknown about the drug, particularly regarding its long-term use. “Doctors will have to be very judicious and feel their way along,” he said. The label for the drug will carry a black box warning – the most serious safety warning issued by the FDA. It will caution users they could experience sedation and problems with attention, judgment and thinking, and that there’s potential for abuse and suicidal thoughts. People who take esketamine will have to be monitored for at least two hours after receiving a dose to guard against some of these side effects…

Topic 6

work 0.053, anxiety 0.030, mg 0.025, bad 0.020, high 0.020, vitamin 0.018, diet 0.015, supplement 0.014, post 0.012, literally 0.011

Reddit post: https://ww w.reddit.com/r/depressionregimens/comments/alh4r3

Topic probability: 0.52

About 3 or 4 years ago, I developed a severe form of anxiety disorder where it manifested in panic attacks characterized by intense bouts of nausea, gagging, and retching. It didn’t usually get bad enough to get to vomiting, though it did in a few instances (in which I went to the hospital afterwards). My body responds to stress naturally by gagging and nausea. So imagine being anxious all the time but also NAUSEOUS 24/7, and I mean literally 24/7 without any respite. At times I was seriously considering suicide because of how bad I felt all the time every day. The whole thing started I think because I had to present at a large conference with thousands of people in attendance, and I had a very bad experience being insulted by some people at a previous iteration of this conference years ago. I was commuting to work one day (before the conference) and suddenly got this massive bout of nausea where I felt like I was dying. I realized however that this was my body telling me I have stagefright. I expected my nausea to evaporate once I finished speaking, as it usually would have in the past. Except that it didn’t. It stayed, and remained with me for years. I tried everything but avoided antidepressants for the longest time due to the bad rep they get. I tried the following medications: * Ginger – in various forms – for nausea (didn’t work) * Peppermint – in various forms – for nausea (didn’t work) * Ondansetron (zofran) – 4 mg; as needed – for nausea (didn’t work) * Chlordiazepoxide/clidinium bromide (librax) – 5 mg; once daily – for nausea and anxiety (didn’t work) * Pyridoxine/doxylamine (diclectin) – 10 mg pyridoxine, 10 mg doxylamine; 2 tablets at bedtime – for nausea (didn’t work) * Metoclopramide – 1 tablet daily – for nausea (didn’t work) * Domperidone – 10 mg; once daily – for nausea (didn’t work) * Propranolol – 10 mg; twice daily – for anxiety (didn’t work) * Prochlorazapine – 10 mg; twice daily – for nausea (didn’t work) * Lorazepam (Ativan) – 1 mg; 1 tablet at bedtime – for anxiety (didn’t work; just made me really sleepy) * Pantoprazole (Tecta) – 1 tablet daily – for nausea (didn’t work) * Dimenhydrinate (Gravol) – 1 tablet as needed – for nausea (didn’t work) * Nabilone (cesamet) – 0.5 mg as needed – for nausea (worked for nausea but not anxiety, and gave me a really uncomfortable high) * Clomipramine (Anafranil) – 10 mg. once daily – for anxiety (didn’t try properly due to side-effects) I was afraid even of getting out of my own house. I was afraid of meeting people. I was afraid of leaving my own room – the only place where I felt somewhat at ease and the nausea wasn’t THAT bad. The only thing that worked somewhat to relieve the nausea was chewing on things, whether that meant food at mealtimes, or fennel seeds, or sucking on mints/cough drops. So I carried mints and fennel seeds with me at all times no matter where I was – including in the washroom in my own house and even when I wanted to take a shower I had to have them nearby otherwise I would literally throw up in the shower. But these were not long-term cures to my problem and only a short alleviation of the symptoms (and not that effective if I was more anxious than usual). I somehow graduated from university with a degree in neuroscience and fought through this nausea-anxiety for 2 years doing so. My graduation ceremony – which was supposed to be a happy occasion – was marred by constant nausea and me going through at least 3 entire tins of mints because my body handles excitedness the same way as it does for anxiety. Literally nothing was working and I was at my wit’s end. So I went downtown Toronto and bought CBD oil from a dispensary. I only did this because I was literally desperate, even though I had never done any recreational drugs in my life upto that point (except caffeine), and even though I had a horrible experience with nabilone (synthetic THC for cancer patients to reduce their nausea) so I was really kind of anxious about even using that. But it worked…

Reddit Depression Regimens – Topic Modeling

Text data (top posts and top comments for those posts) was downloaded from the subreddit depression regimens (https://www.reddit.com/r/depressionregimens/). Data was grouped by post id, in total there were 101 such ids, therefore 101 text documents. After collecting the data, the following data cleaning steps were performed:

any emails were removed from text
urls were removed (http and www)
common contractions were expanded (‘ain’t >> ‘is not’; ‘bday’ >> ‘birthday’; ‘don’t’ >> ‘do not’; etc.)
new line characters were removed
single quotes were removed

After the data cleaning steps were complete, sentences were tokenized into words, and punctuation was removed. English stop words were removed from documents. Python’s gensim.models.phrases.Phraser() was used in order to detect common phrases (bigrams). Lemmatization was then performed and part of speech tagging (POS). Only lemmatized words with certain POS tags were kept, including nouns, adjectives, verbs, adverbs, and proper nouns. Proper nouns were kept in case medication or supplement names get tagged as such. We are interested in how reddit users describe their experiences with certain psychotropic medications and supplements, and therefore the chosen POS tags are the ones that are relevant for descriptions.

An example of an original post is presented below. As a human, we can decipher that the post below is about ketamine and that the user had a positive experience with this treatment. We would be interested in the verbs, adverbs, adjectives, and nouns, that the author used to describe their experience.

I wanted to post this to give hope to those who need a little extra. I know how excruciating both having and battling treatment options for depression and anxiety can be. I’ve seen what I thought was rock bottom. I’ve been to my actual rock bottom, but I am free now.

One year ago, I was sitting in the recliner at my local ketamine clinic receiving my first infusion. The day before I had outlined my plan for suicide and had all my affairs in order, but a friend I had confided in about my depression had a “feeling” I was in a very dangerous place inside my head. I don’t know how she knew what I was planning, but thank goodness she had the foresight and the strength to push me to try one. more. thing. I had heard (and shared) quite a few podcasts from people who had been through treatment, administered the treatments, and even those who had been doing the research behind it all. had been chatting about ketamine as a potential treatment since nothing else had ever worked. She researched clinics, called them all to pick the best one, and made the appointment on an urgent basis getting me in that day.

She took me to the doctor, and after a while, I told him about my plan. I told him that I would give this a try, but this was my last try. After 25 years of my brain being a lab rat for every pill imaginable, years and years of therapy and everything else you can imagine, I was just so tired. He was sympathetic, caring and sat with me for quite a while. Then he started that first IV.

I won’t bore you with all the details of treatment (feel free to ask), but I can say that after the first treatment – one year ago today – I didn’t want to die anymore. I haven’t wanted to since. From time to time, the depression will creep in a little too much for comfort, but I have a lot of self-care tricks to help me get through it. And if It comes down to it, I go in for a booster treatment.
In the past year, I have had 11 infusions. The last 3 were to help me get off the last, and most difficult antidepressant that I ever took. Now, I’m on a very low dose of Lexapro, which I honestly doubt I even need. But I’m stable. I actually know what happiness feels like. And most importantly, I’m alive.
Thanks for reading.

After we perform the steps described above – data cleaning, removing stop words, lemmatization, and keeping only terms with specific POS tags, extract common bigrams, the post above results in the following:

want post give hope need little extra know excruciating battle treatment option depression anxiety see think rock bottom actual rock bottom free year ago sit recliner local ketamine clinic receive first infusion day outline plan suicide affair order friend confide depression feel dangerous place head know know plan thank goodness foresight strength push try thing hear share quite podcast people treatment administered treatment even research chat ketamine potential treatment else ever work research clinic call pick good make appointment urgent basis get day take doctor tell plan tell would give try last try year brain lab rat pill imaginable year year therapy else imagine tired sympathetic caring sit quite start first bear detail treatment feel free ask say first treatment year ago today want die anymore want time time depression creep little much comfort lot self_care trick help come go booster treatment year infusion last help last difficult antidepressant ever take low_dose lexapro honestly doubt even need stable actually know happiness feel importantly alive thank read

In regards to topic modeling, we are interested in the general topics that are discussed in this particular subreddit, Latent Dirichlet Allocation (LDA) can be used specifically for this sort of task. LDA is an unsupervised method for finding topics in text data. Our text is composed of documents, in this case each document is a combination of a post and top comments for a specific post id. LDA assumes that each document is composed of different topics and each topic is composed of different words. Therefore documents can contain overlapping topics and topics can contain overlapping words, but the probabilities for those topics and words will be different.

Since the problem is unsupervised, there are no labels, just text, and we don’t know how many topics there are in our subreddit. There is no exact formula to determine the optimal number of topics for the LDA model. One common way, that we will implement here, is to loop through different number of topics and calculate coherence scores. Then we should choose the model with the highest coherence score. In this specific case, I implemented models for 2, 4, 8, …, 14 topics, and plotted the corresponding coherence scores. As we can see from the chart, the highest value occurs when the number of topics is four, also we see peaks at 10 and 12 topics.

Now let’s see what the topics are.

Number of topics = 4

Topic 1 words	Word prob	Topic 2 words	Word prob	Topic 3 words	Word prob	Topic 4 words	Word prob
feel	0.037	day	0.042	depression	0.017	depression	0.019
thing	0.026	work	0.024	effect	0.014	people	0.017
depression	0.022	feel	0.020	antidepressant	0.013	year	0.016
make	0.021	time	0.020	ssris	0.012	give	0.016
life	0.019	sleep	0.018	doctor	0.012	treatment	0.015
time	0.013	good	0.014	anxiety	0.010	month	0.013
bad	0.013	thing	0.012	side_effect	0.010	start	0.013
good	0.012	start	0.010	mg	0.010	find	0.012
lot	0.010	bed	0.009	drug	0.009	hope	0.012
depressed	0.010	exercise	0.009	psychiatrist	0.009	ketamine	0.011

If we choose 10 topics:

Topic 1 words	Word prob	Topic 2 words	Word prob	Topic 3 words	Word prob	Topic 4 words	Word prob	Topic 5 words	Word prob
feel	0.039	people	0.044	day	0.037	ssris	0.027	treatment	0.035
year	0.026	depression	0.037	thing	0.035	antidepressant	0.024	ketamine	0.028
thing	0.022	doctor	0.028	feel	0.033	effect	0.024	year	0.022
symptom	0.020	psychiatrist	0.020	make	0.024	drug	0.022	work	0.021
brain	0.019	make	0.020	find	0.017	side_effect	0.020	drug	0.017
start	0.018	bad	0.016	good	0.016	depression	0.019	hope	0.015
time	0.017	therapy	0.016	exercise	0.016	serotonin	0.016	hear	0.012
make	0.015	therapist	0.015	eat	0.013	prescribe	0.014	lithium	0.011
issue	0.015	find	0.014	walk	0.013	treat	0.013	people	0.010
lot	0.014	problem	0.013	lot	0.013	ssri	0.012	infusion	0.009
Topic 6 words	Word prob	Topic 7 words	Word prob	Topic 8 words	Word prob	Topic 9 words	Word prob	Topic 10 words	Word prob
work	0.053	time	0.033	sleep	0.053	experience	0.039	life	0.062
anxiety	0.030	make	0.028	day	0.037	day	0.030	feel	0.030
mg	0.025	depression	0.015	time	0.030	feel	0.029	depression	0.029
bad	0.020	long	0.015	bed	0.024	depression	0.024	thing	0.020
high	0.020	call	0.014	start	0.024	mind	0.020	find	0.019
vitamin	0.018	depressed	0.014	feel	0.023	give	0.017	good	0.017
diet	0.015	feeling	0.013	morning	0.020	month	0.017	live	0.017
supplement	0.014	people	0.013	wake	0.018	good	0.015	bad	0.014
post	0.012	read	0.013	night	0.014	week	0.013	change	0.014
literally	0.011	focus	0.013	hour	0.013	back	0.012	year	0.013

I think that even with this small sample size – 101 top posts and corresponding top comments, LDA results provide us with a good understanding of what users discuss in the depressionregimens subreddit. There are discussions about life, feeling depressed, for how long the depression has been going on (mentions of week/month/year), mentions of how the day goes (Topic 7), mentions of specific treatments (Topic 4), supplements (Topic 5), SSRIs and side effects (Topic 3), exercise (Topic 2).

It’s possible to then apply the chosen model to each document in order to obtain the topics distribution by document. For example, we can choose the model with 10 topics, obtain topics distribution by document, and determine the topic with the maximum probability for each document. Then we can select sample documents that have the highest probability for a given topic. If we choose topic 2, which contains the following word distribution:

(‘people’, 0.04), (‘depression’, 0.038), (‘doctor’, 0.028), (‘psychiatrist’, 0.020),
(‘make’, 0.020), (‘bad’, 0.016), (‘therapy’, 0.016), (‘therapist’, 0.015), (‘find’, 0.014),
(‘problem’, 0.013)

We can find documents that have the maximum probability for topic 2:

“This might be an unconventional treatment considering that many of us post about their experience with various drugs.
I myself struggled with mental health in the past. I can say my mental health issues in the past were to 90% biological (hormonal problems). Once I treated the causes, over time the upwards spiral in my personal wellbeing (and life in general) started again.
In early twenties, my life was starting to go down the gutter. My life started to fall apart in every domain. I was severely depressed. I found out that some of my hormones were very low. I started hormone replacement. Whereas before my life was a nightmare, it has been great ever since. I could even get off the SSRIs I was on.
I wrote an article about my journey. How Hormones Destroyed and Saved My Life.
My dream is to live in a world where no one is held back from living an at least decent life the way I was. Even though not my fault, it is my life. And thus my responsibility. Without accepting and acting on that I just don´t know where I would be today. For sure I wouldn´t be writing this. Hope you find value in it… “

(Can read full text at https://www.reddit.com/r/depressionregimens/comments/lef32x )

The topics distribution for this document is as follows:
[1: 0.074, 2: 0.338, 3: 0.032, 4: 0.069, 5: 0.083, 6: 0.084, 7: 0.052, 8: 0.054, 9: 0.073, 10: 0.153]

Lemmatized text:

[‘may’, ‘unconventional’, ‘treatment’, ‘consider’, ‘many’, ‘us’, ‘post’, ‘experience’, ‘various’, ‘drug’, ‘struggle’, [‘may’, ‘unconventional’, ‘treatment’, ‘consider’, ‘many’, ‘us’, ‘post’, ‘experience’, ‘various’, ‘drug’, ‘struggle’, ‘mental_health’, ‘say’, ‘mental_health’, ‘issue’, ‘biological’, ‘hormonal’, ‘problem‘, ‘treat’, ’cause’, ‘time’, ‘upwards’, ‘spiral’, ‘personal’, ‘wellbeing’, ‘life’, ‘general’, ‘start’, ‘early’, ‘twenty’, ‘life’, ‘start’, ‘go’, ‘gutter’, ‘life’, ‘start’, ‘fall’, ‘apart’, ‘domain’, ‘severely_depresse’, ‘find‘, ‘hormone’, ‘low’, ‘start’, ‘hormone’, ‘replacement’, ‘life’, ‘nightmare’, ‘great’, ‘ever’, ‘since’, ‘could’, ‘even’, ‘ssris’, ‘write’, ‘article’, ‘journey’, ‘hormone’, ‘destroy’, ‘save’, ‘life’, ‘dream’, ‘live’, ‘world’, ‘hold’, ‘back’, ‘live’, ‘least’, ‘decent’, ‘life’, ‘way’, ‘even’, ‘fault’, ‘life’, ‘thus’, ‘responsibility’, ‘accept’, ‘act’, ‘know’, ‘would’, ‘today’, ‘sure’, ‘write’, ‘hope’, ‘find‘, ‘value’, ‘opinion’, ‘replace’, ‘hormone’, ‘deficient’, ‘far’, ‘natural’, ‘also’, ‘effective’, ‘artificial’, ‘med’, ‘however’, ‘believe’, ‘hormone’, ‘deficiency’, ‘may’, ‘much’, ‘common’, ‘assume’, ‘people‘, ‘never’, ‘get’, ‘hormone’, ‘check’, ‘often’, ‘even’, ‘life’, ‘want’, ‘give’, ‘head’, ‘other’, ‘pull’, ‘trigger’, ‘medication’, ‘claim’, ‘medication’, ‘work’, ‘hormone’, ‘check’, ‘opportunity’, ‘cost’, ‘high’, ‘similar’, ‘experience’, ‘hormone’, ‘hormone’, ‘dangerous’, ‘play’, ‘make‘, ‘sure’, ‘talk’, ‘doctor‘, ‘monitor’, ‘doctor‘, ‘lock’, ‘post’, ‘people‘, ‘would’, ‘see’, ‘unlocked’, ‘pm’, ‘otherwise’, ‘leave’, ‘lock’, ‘play’, ‘hormone’, ‘medical’, ‘supervision’, ‘highly’, ‘detrimental’, ‘health’, ‘thyroid’, ‘hormone’, ‘deficient’, ‘know’, ‘other’, ‘start’, ‘take’, ‘mcg’, ‘thyroxine’, ‘treat’, ‘hypothyroidism’, ‘run’, ‘family’, ‘fog’, ‘seem’, ‘lift’, ‘agree’, ‘hormone’, ‘underrated’, ‘come’, ‘depression‘, ‘thank’, ‘share’, ‘hormone’, ‘specifically’, ‘testosterone’, ‘direct’, ‘correlation’, ‘dopamine’, ‘high’, ‘test’, ‘high’, ‘dopamine’, ‘vice’, ‘versa’, ‘generally’, ‘testerone’, ‘wellbutrin’, ‘increase’, ‘libido’, ‘endocrine’, ‘system’, ‘research’, ‘seem’, ‘lag’, ‘research’, ‘treatment’, ‘know’, ‘million’, ‘could’, ‘suffer’, ‘needlessly’, ‘ignore’, ‘op’, ‘entire’, ‘post’, ‘structure’, ‘sway’, ‘people‘, ‘way’, ‘link’, ‘closing’, ‘paragraph’, ‘also’, ‘spamme’, ‘numerous’, ‘time’, ‘different’, ‘thing’, ’cause’, ‘depression‘, ‘know’, ‘enough’, ‘dark’, ‘age’, ‘exclusive’, ‘seratonin’, ‘hormone’, ‘receptor’, ‘regulation’, ‘drug’, ‘abuse’, ‘dopamine’, ‘ach’, ‘brain’, ‘damage’, ‘gaba’, ‘glutamate’, ‘imbalance’, ‘relate’, ‘several’, ‘brain’, ‘region’, ‘receptor’, ‘site’, ‘together’, ‘hormone’, ‘conversion’, ‘chain’, ‘adhd’, ‘bp’, ‘level’, ‘bdnf’, ‘several’, ‘type’, ‘disease’, ‘additionally’, ‘low’, ‘end’, ‘hormone’, ‘scale’, ‘total’, ‘free’, ‘may’, ‘feel’, ‘symptom’, ‘other’, ‘would’, ‘conversely’, ‘man’, ‘may’, ‘almost’, ‘nil’, ‘estrogen’, ‘high’, ‘estrogen’, ‘side_effect’, ‘decent’, ‘doctor‘, ‘full’, ‘blood’, ‘panel’, ‘hormone’, ‘panel’, ‘include’, ‘ask’, ‘depend’, ‘free’, ‘go’, ‘private’, ‘cost’, ‘uk’, ‘take’, ‘important’, ‘relative’, ‘commit’, ‘find‘, ‘thyroid’, ‘level’, ‘way’, ‘back’, ‘thyroid’, ‘problem‘, ‘handle’, ‘psych’, ‘med’, ‘need’, ‘depression‘, ‘probably’, ‘lifelong’, ‘become’, ‘unmanageable’, ‘thyroid’, ‘cancer’, ‘luckily’, ‘old’, ‘easy’, ‘catch’, ‘get’, ‘point’, ‘hormone’, ‘low’, ‘find‘, ‘hormone’, ‘check’, ‘yearly’, ‘perfectly’, ‘normal’, ‘even’, ‘high’, ‘yet’, ‘still’, ‘depressed’, ‘hormone’, ‘may’, ‘help’, ‘people‘, ‘many’, ‘still’, ‘depress’, ‘physiological’, ‘duck’, ‘row’, ‘infuriate’, ‘many’, ‘doctor‘, ‘refuse’, ‘prescribe’, ‘hrt’, ‘guess’, ‘taboo’, ‘medical’, ‘school’, ‘pull’, ‘tooth’, ‘find‘, ‘decent’, ‘doctor‘, ‘even’, ‘consider’, ‘apparently’, ‘fear’, ‘cancer’, ‘induce’, ‘hormone’, ‘frankly’, ‘rather’, ‘live’, ‘good’, ‘life’, ‘even’, ‘mean’, ‘get’, ‘cancer’, ‘live’, ‘cancer’, ‘free’, ‘life’, ‘mentally’, ‘miserable’, ‘post’, ‘multiple’, ‘account’, ‘whole’, ‘time’, ‘person’, ‘post’, ‘often’, ‘article’, ‘different’, ‘account’, ‘sometimes’, ‘claim’, ‘last’, ‘year’, ‘biology’, ‘student’, ‘other’, ‘last’, ‘year’, ‘medicine’, ‘student’, ‘post’, ‘lame’, ‘excuse’, ‘lure’, ‘costumer’, ‘hormetheu’, ‘thank’, ‘share’, ‘disregard’, ‘irrational’, ‘post’, ‘intelligent’, ‘enough’, ‘determine’, ‘right’, ‘see’, ‘sort’, ‘ground’, ‘swell’, ‘business’, ‘activity’, ‘even’, ‘touch’, ‘consultation’, ‘hormone’, ‘way’, ‘business’, ‘s’, ‘even’, ‘well’, ‘talk’, ‘get’, ‘free’, ‘professsional’, ‘guidance’, ‘think’, ‘people‘, ‘stick’, ‘depression‘, ’cause’, ‘people‘, ‘pursue’, ‘treatment’, ‘may’, ‘save’, ‘life’, ‘know’, ‘firsthand’, ‘appropriate’, ‘way’, ‘respond’, ‘tell’, ‘support’, ‘other’, ‘say’, ‘mother’, ‘first’, ‘tell’, ‘hit’, ‘would’, ‘sit’, ‘kitchen’, ‘table’, ‘cry’, ‘uncontrollably’, ‘start’, ‘hrt’, ‘right’, ‘take’, ‘nurse’, ‘year’, ‘tortuous’, ‘severe’, ‘depression‘, ‘ask’, ‘do’, ‘hormone’, ‘panel’, ‘flabbergast’, ‘go’, ‘lowt’, ‘men’, ‘health’, ‘center’, ‘addition’, ‘find‘, ‘severely’, ‘low’, ‘receive’, ‘great’, ‘man’, ‘health’, ‘care’, ‘know’, ‘funny’, ‘deduce’, ‘man’, ‘mid’, ‘life’, ‘crisis’, ‘hormone’, ‘imbalance’, ‘likely’, ‘low’, ‘get’, ‘ball’, ‘bust’, ‘buy’, ‘corvette’, ‘woman’, ‘get’, ‘sympathy’, ‘go’, ‘change’, ‘enough’, ‘question’, ‘come’, ‘first’, ‘opinion’, ‘testosterone’, ‘brain’, ‘get’, ‘testosterone’, ‘shot’, ‘help’, ‘put’, ‘dent’, ‘depression‘, ‘make‘, ‘feel’, ‘well’, ‘still’, ‘leave’, ‘pretty’, ‘severe’, ‘depression‘, ‘admittedly’, ‘hormone’, ‘vitamin’, ‘could’, ‘do’, ‘aggressively’, ‘recently’, ‘do’, ‘put’, ‘brain’, ‘glide’, ‘path’, ‘depression‘, ‘amazing’, ‘think’, ‘fix’, ‘fix’, ‘brain’, ‘still’, ‘aggressively’, ‘pursue’, ‘low’, ‘hear’, ‘cortisol’, ‘kill’, ‘testosterone’]

Neural Network Predicting Subreddit Likes and Comments for Mental Health Topics

Introduction:

Problem Description:

This project analyses whether the attributes of the subreddit posts’ content are predictive of social support (karma points and number of comments) posts receive.

Context of the Problem:

Mental health problems tend to go under-reported and under-addressed, which places a high social-economic cost on the society. Research shows that social support is valuable for improving quality of life for people with mental health illnesses.

This project examines what content attributes of anonymous social media posts on reddit platform elicit higher levels of social support in the form of karma points and comments.

Limitation About other Approaches:

We have examined two most relevant papers on the topic, [1] and [2]. Neither Schrading, N. et al. [1], nor De Choudhury, M. & De, S. [2] use subreddit indicator variables (i.e., indicators for schizophrenia, depression, anxiety, etc.) in their analysis. It is likely that posts are treated differently, depending on a mental illness indicated (as per Mann, C. E. & Himelein, M. J. [3], “stigmatization of schizophrenia was significantly higher than stigmatization of depression”). Also, De Choudhury, M. & De, S. [2] used a resource intensive manual labelling approach to arrive at keywords.

Solution:

In this project, the analysis includes subreddit indicators in the neural network model predicting social supports for reddit posts. The figure below shows statistics for subreddit indicators for a sample dataset. It can be seen that the mean for the target variables is very different between subreddits.

Additional inputs include counts of frequent bigrams and emotion labelling of keywords. Emotion labelling was done through an NLP approach, using an already existing emotions lexicon.

Background:¶

Reference	Explanation	Dataset/Input	Weakness
Schrading, N. et al. [1]	They trained and compared multiple classifiers on content of reddit posts to determine the top semantic and linguistic features in detecting abusive relationships.	Subreddit posts with comments that focus on domestic abuse, plus subreddit posts with comments unrelated to domestic abuse as a control set.	Future studies could be implemented on datasets from multiple websites to compare online abuse patterns across forums.
De Choudhury, M. & De, S. [2]	They trained a negative binomial regression model on content of reddit posts (i.e., length, use of 1st pronoun, relationship words, emoticons, positive and negative words, etc.) to predict social support variables (karma points and number of responses).	Posts, comments and associated metadatafrom several mental health subreddits, including alcoholism, anxiety, bipolarreddit, depression,mentalhealth, MMFB (Make Me Feel Better), socialanxiety, SuicideWatch.	– Out of the top 15 discussed predicting variables used in the regression model, the highest coefficient have the intercept and the use of the 1st pronoun. – There is no discussion about correlations between predicting variables (for example, the study uses such variables as negative emotion, positive emotion and number of emoticons, which could be correlated).

Methodology

Schrading, N. et al. [1] reported that out of the post features they analyzed, ngrams were the most predicting ones when detecting abusive relationships in reddit posts. De Choudhury M. & De, S. [2] tried to predict social support variables for mental health related reddit posts using post length, emoticons, unigrams, variables built based on presence of emotionally charged unigrams, etc.

In this project, to predict social support variables (scores and number of comments) for mental health related reddit posts, the model was built using the neural networks approach and with emotionally charged unigrams as indicators of 10 different emotions, emotions count, post length, part of speech frequencies (counts of verbs, pronouns, adverbs and adjectives), count of first pronouns, number of question marks, post length, count of frequent bigrams, and subreddit indicators as predictive variables.

Below is the list of the input used in the models for predicting the score and number of comments:

‘anger’, ‘anticipation’, ‘disgust’, ‘fear’, ‘joy’, ‘negative’, ‘positive’, ‘sadness’, ‘surprise’, ‘trust’,’len_post’, ‘len_post_orig’, ‘first_pronoun_count’, ‘freq_bigram_count’, ‘q_count’, ‘verb_count’, ‘pronoun_count’,’adverb_count’, ‘adjective_count’, Subreddit(display_name=’BipolarReddit’), Subreddit(display_name=’Anxiety’), Subreddit(display_name=’depression’), Subreddit(display_name=’schizophrenia’), Subreddit(display_name=’bipolar’), Subreddit(display_name=’mentalhealth’), Subreddit(display_name=’depression_help’), Subreddit(display_name=’BPD’), Subreddit(display_name=’socialanxiety’), Subreddit(display_name=’mentalillness’)

Emotion lexicon

A public lexicon dataset was used to determine counts of specific emotion words. The NRC Emotion Lexicon is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive).

https://nrc.canada.ca/en/research-development/products-services/technical-advisory-services/sentiment-emotion-lexicons

Below are examples of posts with most frequent bigrams highlighted. Frequent bigrams ‘feel like’, ‘feels like’ are consistent with the finding by De Choudhury M. & De, S. [2] of frequent unigrams related to emotional expression.

N-grams

For this project we identified most popular bigrams and trigrams. The counts of most frequent bigrams and trigrams were used while testing various models, and the most useful data turned out to be counts of most frequent 16 bigrams, which were used as one of the inputs to the model.

Below is the list of the most popular bigrams used and a few examples of their usage in raw texts.

Implementation

Data Collection

Obtained data via a public API from 10 mental health subreddits: “depression”, “anxiety”, “bipolarreddit”, “mentalhealth”, “socialanxiety”, “depression_help”, “bipolar”, “BPD”, “schizophrenia”, and “mentalillness”.

First, checking 10 hot posts for each subreddit indicator
Collecting data

top_posts dimensions: (9949, 9)

hot_posts dimensions: (9890, 9)

new_posts dimensions: (9896, 9)

Preparing the Data

reddit data scraping is limited to a maximum of 1000 records per subreddit per each of 3 post categories (“hot”, “top” and “new” posts). To maximize the dataset size, we collected posts of all 3 categories and removed duplicate records that have categories overlapping. As mentioned by De Choudhury M. & De, S. [2], reddit posts reach most of their commentary within the first 3 days from being posted. Thus, we removed posts that were “younger” than 3 days old at the data collection time.

Removing stop words and punctuation
Created ngrams (bigrams, trigrams and fourgrams)
Applying smoothing for trigrams and removing extra words referring to posts, unrelated to this analysis (i.e., moderator’s posts)
Creating emotions dataframe, count POS (part of speech) tags, and topic/subreddit dummies

Reddit score prediction model – results based on first layer weights:
In a multi-layer neural network it is hard to interpret raw internal weights, but it looks like mental health-specific variables (such as indicators for fear or surprise, or subreddit indicators) are more important than generic (such as verb count or the length of the post, which looks to be least useful). In particular most subreddit indicators (“depression_help”, “depression”, “schizophrenia”, etc.), which were not used in other papers, are in top 10 for total weights.

Conclusion and Future Direction¶

In conclusion, neural network results showed that the model inputs do have some predictive power for social response variables ‘number of comments’ and ‘score’, as the sums of weights for input variables were found to be greater than zero. Also during model testing, starting with fewer input variables, adding the rest of the input variables reduced the absolute mean errors.

One of the future improvements for this analysis could be incorporating a variable that indicates whether the post is from a throwaway account or an existing long-term reddit account, as De Choudhury, M. & De, S. [2] mention that reddit’s throwaway accounts allow individuals to express themselves more honestly and to ‘discuss uninhibited feelings’.

Also, while content and length of post titles and how users action on posts (click, read, and reply) might have an impact on post’s score, neither of the research papers cited, nor this analysis used title analysis as a part of the model. As such adding title attributes and post actioning statistics variables to the model could be a potential area for improvement.

References:

[1]: Schrading, N., Alm, C. O., Ptucha, R., & Homan, C. M. An Analysis of Domestic Abuse Discourse on Reddit, The 2015 Conference of Empirical Methods in Natural Language Processing, Lisbon, Portugal, September 2015, pages 2577-2583.

[2]: De Choudhury, M. & De, S. Mental Health Discourse on reddit: Self-Disclosure, Social Support, and Anonymity. Eights International AAAI Conference on Weblogs and Social Media, North America, May 2014, pages 71-80. Available at: https://www.aaai.org/ocs/index.php/ICWSM/ICWSM14/paper/view/8075/8107.

[3] Mann, C. E. & Himelein, M. J. Factors Associated with Stigmatization of Persons with Mental Illness. Psychiatric Services, Vol. 55, No. 2., February 2004, pages 185-197. Available at: https://ps.psychiatryonline.org/doi/pdf/10.1176/appi.ps.55.2.185.