NLP – Word Embeddings – ELMo

ELMo (Embeddings from Language Models) is a deep learning approach for representing words as vectors (also called word embeddings). It was developed by researchers at Allen Institute for Artificial Intelligence and introduced in a paper published in 2018.

ELMo represents words as contextualized embeddings, meaning that the embedding for a word can change based on the context in which it is used. For example, the word “bank” could have different embeddings depending on whether it is used to refer to a financial institution or the edge of a river.

ELMo has been shown to improve the performance of a variety of natural language processing tasks, including language translation, question answering, and text classification. It has become a popular approach for representing words in NLP models, and the trained ELMo embeddings are freely available for researchers to use.

How does ELMo differ from Word2Vec or GloVe?

ELMo (Embeddings from Language Models) is a deep learning approach for representing words as vectors (also called word embeddings). It differs from other word embedding approaches, such as Word2Vec and GloVe, in several key ways:

  • Contextualized embeddings: ELMo represents words as contextualized embeddings, meaning that the embedding for a word can change based on the context in which it is used. In contrast, Word2Vec and GloVe represent words as static embeddings, which do not take into account the context in which the word is used.
  • Deep learning approach: ELMo uses a deep learning model, specifically a bidirectional language model, to generate word embeddings. Word2Vec and GloVe, on the other hand, use more traditional machine learning approaches based on a neural network (Word2Vec) and matrix factorization (GloVe).

To generate context-dependent embeddings, ELMo uses a bi-directional Long Short-Term Memory (LSTM) network trained on a specific task (such as language modeling or machine translation). The LSTM processes the input sentence in both directions (left to right and right to left) and generates an embedding for each word based on its context in the sentence.

Overall, ELMo is a newer approach for representing words as vectors that has been shown to improve the performance of a variety of natural language processing tasks. It has become a popular choice for representing words in NLP models.

What is the model for training ELMo word embeddings?

The model used to train ELMo word embeddings is a bidirectional language model, which is a type of neural network that is trained to predict the next word in a sentence given the context of the words that come before and after it. To train the ELMo model, researchers at Allen Institute for Artificial Intelligence used a large dataset of text, such as news articles, books, and websites. The model was trained to predict the next word in a sentence given the context of the words that come before and after it. During training, the model learns to represent words as vectors (also called word embeddings) that capture the meaning of the word in the context of the sentence.

Explain in details the bidirectional language model

A bidirectional language model is a type of neural network that is trained to predict the next word in a sentence given the context of the words that come before and after it. It is called a “bidirectional” model because it takes into account the context of words on both sides of the word being predicted.

To understand how a bidirectional language model works, it is helpful to first understand how a unidirectional language model works. A unidirectional language model is a type of neural network that is trained to predict the next word in a sentence given the context of the words that come before it.

A unidirectional language model can be represented by the following equation:

P(w[t] | w[1], w[2], …, w[t-1]) = f(w[t-1], w[t-2], …, w[1])

This equation says that the probability of a word w[t] at time t (where time is the position of the word in the sentence) is determined by a function f of the words that come before it (w[t-1], w[t-2], …, w[1]). The function f is learned by the model during training.

A bidirectional language model extends this equation by also taking into account the context of the words that come after the word being predicted:

P(w[t] | w[1], w[2], …, w[t-1], w[t+1], w[t+2], …, w[n]) = f(w[t-1], w[t-2], …, w[1], w[t+1], w[t+2], …, w[n])

This equation says that the probability of a word w[t] at time t is determined by a function f of the words that come before it and the words that come after it. The function f is learned by the model during training.

In practice, a bidirectional language model is implemented as a neural network with two layers: a forward layer that processes the input words from left to right (w[1], w[2], …, w[t-1]), and a backward layer that processes the input words from right to left (w[n], w[n-1], …, w[t+1]). The output of these two layers is then combined and used to predict the next word in the sentence (w[t]). The forward and backward layers are typically implemented as recurrent neural networks (RNNs) or long short-term memory (LSTM) networks, which are neural networks that are designed to process sequences of data.

During training, the bidirectional language model is fed a sequence of words and is trained to predict the next word in the sequence. The model uses the output of the forward and backward layers to generate a prediction, and this prediction is compared to the actual next word in the sequence. The model’s weights are then updated to minimize the difference between the prediction and the actual word, and this process is repeated for each word in the training dataset. After training, the bidirectional language model can be used to generate word embeddings by extracting the output of the forward and backward layers for each word in the input sequence.

ELMo model training algorithm

  1. Initialize the word vectors:
  • The word vectors are usually initialized randomly using a Gaussian distribution.
  • Alternatively, you can use pre-trained word vectors such as Word2Vec or GloVe.
  1. Process the input sequence:
  • Input the sequence of words w[1], w[2], ..., w[t-1] into the forward layer and the backward layer.
  • The forward layer processes the words from left to right, and the backward layer processes the words from right to left.
  • Each layer has its own set of weights and biases, which are updated during training.
  1. Compute the output:
  • The output of the forward layer and the backward layer are combined to form the final output o[t].
  • The final output is used to predict the next word w[t].
  1. Compute the loss:
  • The loss is computed as the difference between the predicted word w[t] and the true word w[t].
  • The loss function is usually the cross-entropy loss, which measures the difference between the predicted probability distribution and the true probability distribution.
  1. Update the weights and biases:
  • The weights and biases of the forward and backward layers are updated using gradient descent and backpropagation.
  1. Repeat steps 2-5 for all words in the input sequence.

ELMo generates contextualized word embeddings by combining the hidden states of a bi-directional language model (BLM) in a specific way.

The BLM consists of two layers: a forward layer that processes the input words from left to right, and a backward layer that processes the input words from right to left. The hidden state of the BLM at each position t is a vector h[t] that represents the context of the word at that position.

To generate the contextualized embedding for a word, ELMo concatenates the hidden states from the forward and backward layers and applies a weighted summation. The hidden states are combined using a task-specific weighting of all biLM layers. The weighting is controlled by a set of learned weights γ_task and a bias term s_task. The ELMo embeddings for a word at position k are computed as a weighted sum of the hidden states from all L layers of the biLM:

ELMo_task_k = E(R_k; Θtask) = γ_task_L * h_LM_k,L + γ_task{L-1} * h_LM_k,{L-1} + … + γ_task_0 * h_LM_k,0 + s_task

Here, h_LM_k,j represents the hidden state at position k and layer j of the biLM, and γ_task_j and s_task are the task-specific weights and bias term, respectively. The task-specific weights and bias term are learned during training, and are used to combine the hidden states in a way that is optimal for the downstream task.

Using ELMo for NLP tasks

ELMo can be used to improve the performance of supervised NLP tasks by providing context-dependent word embeddings that capture not only the meaning of the individual words, but also their context in the sentence.

To use a pre-trained bi-directional language model (biLM) for a supervised NLP task, the first step is to run the biLM and record the layer representations for each word in the input sequence. These layer representations capture the context-dependent information about the words in the sentence, and can be used to augment the context-independent token representation of each word.

In most supervised NLP models, the lowest layers are shared across different tasks, and the task-specific information is encoded in the higher layers. This allows ELMo to be added to the model in a consistent and unified manner, by simply concatenating the ELMo embeddings with the context-independent token representation of each word.

The model then combines the ELMo embeddings with the context-independent token representation to form a context-sensitive representation h_k, typically using either bidirectional RNNs, CNNs, or feed-forward networks. The context-sensitive representation h_k is then used as input to the higher layers of the model, which are task-specific and encode the information needed to perform the target NLP task. It can be helpful to add a moderate amount of dropout to ELMo and to regularize the ELMo weights by adding a regularization term to the loss function. This can help to prevent overfitting and improve the generalization ability of the model.

NLP – Word Embeddings – FastText

What is the FastText method for word embeddings?

FastText is a library for efficient learning of word representations and sentence classification. It was developed by Facebook AI Research (FAIR).

FastText represents each word in a document as a bag of character n-grams. For example, the word “apple” would be represented as the following character n-grams: “a”, “ap”, “app”, “appl”, “apple”, “p”, “pp”, “ppl”, “pple”, “p”, “pl”, “ple”, “l”, “le”. This representation has two advantages:

  1. It can handle spelling mistakes and out-of-vocabulary words. For example, the model would still be able to understand the word “apple” even if it was misspelled as “appel” or “aple”.
  2. It can handle words in different languages with the same script (e.g., English and French) without the need for a separate model for each language.

FastText uses a shallow neural network to learn the word representations from this character n-gram representation. It is trained using the skip-gram model with negative sampling, similar to word2vec.

FastText can also be used for sentence classification by averaging the word vectors for the words in the sentence and training a linear classifier on top of the averaged vector. It is particularly useful for languages with a large number of rare words, or in cases where using a word’s subwords (also known as substrings or character n-grams) as features can be helpful.

How are word embeddings trained in FastText?

Word embeddings in FastText can be trained using either the skip-gram model or the continuous bag-of-words (CBOW) model.

In the skip-gram model, the goal is to predict the context words given a target word. For example, given the input sequence “I have a dog”, the goal would be to predict “have” and “a” given the target word “I”, and to predict “I” given the target word “have”. The skip-gram model learns to predict the context words by minimizing the negative log likelihood of the context words given the target word.

In the CBOW model, the goal is to predict the target word given the context words. For example, given the input sequence “I have a dog”, the goal would be to predict “I” given the context words “have” and “a”, and to predict “have” given the context words “I” and “a”. The CBOW model learns to predict the target word by minimizing the negative log likelihood of the target word given the context words.

Both the skip-gram and CBOW models are trained using stochastic gradient descent (SGD) and backpropagation to update the model’s parameters. The model is trained by minimizing the negative log likelihood of the words in the training data, given the model’s parameters.

Explain how FastText represents each word in a document as a bag of character n-grams

To represent a word as a bag of character n-grams, FastText breaks the word down into overlapping substrings (also known as character n-grams). For example, the word “apple” could be represented as the following character 3-grams (trigrams): [“app”, “ppl”, “ple”]. The number of characters in each substring is specified by the user and is typically set to between 3 and 6 characters.

For example, consider the following sentence:

“I have a dog”

If we set the number of characters in each substring to 3, FastText would represent each word in the sentence as follows:

“I”: [“I”] “have”: [“hav”, “ave”] “a”: [“a”] “dog”: [“dog”]

The use of character n-grams allows FastText to learn good vector representations for rare words, as it can use the vector representations of the character n-grams that make up the rare word to compute its own vector representation. This is particularly useful for handling out-of-vocabulary words that may not have a pre-trained vector representation available.

How are vector representations for each word computed from n-gram vectors?

In FastText, the vector representation for each word is computed as the sum of the vector representations of the character n-grams (subwords) that make up the word. For example, consider the following sentence:

“I have a dog”

If we set the number of characters in each substring to 3, FastText would represent each word in the sentence as a bag of character 3-grams (trigrams) as follows:

“I”: [“I”] “have”: [“hav”, “ave”] “a”: [“a”] “dog”: [“dog”]

FastText would then learn a vector representation for each character n-gram and use these vector representations to compute the vector representation for each word. For example, the vector representation for the word “have” would be computed as the sum of the vector representations for the character n-grams [“hav”, “ave”].

Since there can be huge number of unique n-grams, how does FastText deal with the memory requirement?

One of the ways that FastText deals with the large number of unique character n-grams is by using hashing to map the character n-grams to a fixed-size hash table rather than storing them in a dictionary. This allows FastText to store the character n-grams in a compact form, which can save memory.

What is hashing? How are character sequences hashed to integer values?

Hashing is the process of converting a given input (called the ‘key’) into a fixed-size integer value (called the ‘hash value’ or ‘hash code’). The key is typically some sort of string or sequence of characters, but it can also be a number or other data type.

There are many different ways to hash a character sequence, but most algorithms work by taking the input key, performing some mathematical operations on it, and then returning the hash value as an integer. The specific mathematical operations used will depend on the specific hashing algorithm being used.

One simple example of a hashing algorithm is the ‘modulo’ method, which works as follows:

  1. Take the input key and convert it into a numerical value, for example by assigning each character in the key a numerical value based on its ASCII code.
  2. Divide this numerical value by the size of the hash table (the data structure in which the hashed keys will be stored).
  3. The remainder of this division is the hash value for the key.

This method is simple and fast, but it is not very robust and can lead to a high number of collisions (when two different keys produce the same hash value). More sophisticated algorithms are typically used in practice to improve the performance and reliability of hash tables.

How is the Skip-gram with negative sampling applied in FastText?

Skip-gram with negative sampling (SGNS) algorithm is used to learn high-quality word embeddings (i.e., dense, low-dimensional representations of words that capture the meaning and context of the words). The Skip-gram with negative sampling algorithm works by training a predictive model to predict the context words (i.e., the words that appear near a target word in a given text) given the target word. During training, the model is given a sequence of word pairs (a target word and a context word) and tries to predict the context words given the target words.

To train the model, the SGNS algorithm uses a technique called negative sampling, which involves sampling a small number of negative examples (random words that are not the true context words) and using them to train the model along with the positive examples (the true context words). This helps the model to learn the relationship between the target and context words more efficiently by focusing on the most informative examples.

The SGNS algorithm steps are as following:

  • The embedding for a target word (also called the ‘center word’) is calculated by taking the sum of the embeddings for the word itself and the character n-grams that make up the word.
  • The context words are represented by their word embeddings, without adding the character n-grams.
  • Negative samples are selected randomly from the vocabulary during training, with the probability of selecting a word being proportional to the square root of its unigram frequency (i.e., the number of times it appears in the text).
  • The dot product of the embedding for the center word and the embedding for the context word is calculated. We then need to normalize the similarity scores over all of the context words in the vocabulary, so that the probabilities sum to 1 and form a valid probability distribution.
  • Compute the cross-entropy loss between the predicted and true context words. Use an optimization algorithm such as stochastic gradient descent (SGD) to update the embedding vectors in order to minimize this loss. This involves bringing the actual context words closer to the center word (i.e., the target word) and increasing the distance between the center word and the negative samples.

    The cross-entropy loss function can be expressed as:
  • L = – ∑i(y_i log(p(w_i|c)) + (1 – y_i)log(1 – p(w_i|c)))
  • where:
  • L is the cross-entropy loss.
  • y_i is a binary variable indicating whether context word i is a positive example (y_i = 1) or a negative example (y_i = 0).
  • p(w_i|c) is the probability of context word i given the target word c and its embedding.
  • ∑i indicates that the sum is taken over all context words i in the vocabulary.

FastText and hierarchical softmax

FastText can use a technique called hierarchical softmax to reduce the computation time during training. Hierarchical softmax works by organizing the vocabulary into a binary tree, with the word at the root of the tree and its descendant words arranged in a hierarchy according to their probability of occurrence.

During training, the model uses the hierarchical structure of the tree to compute the loss and update the model weights more efficiently. This is done by traversing the tree from the root to the appropriate leaf node for each word, rather than computing the loss and updating the weights for every word in the vocabulary separately.

The standard softmax function has a computational complexity of O(Kd), where K is the number of classes (i.e., the size of the vocabulary) and d is the number of dimensions in the hidden layer of the model. This complexity arises from the need to normalize the probabilities over all potential classes in order to obtain a valid probability distribution. The hierarchical softmax reduces the computational complexity to O(d*log(K)). Huffman coding can be used to construct a binary tree structure for the softmax function, where the lowest frequency classes are placed deeper into the tree and the highest frequency classes are placed near the root of the tree.

In the hierarchical softmax function, a probability is calculated for each path through the Huffman coding tree, based on the product of the output vector v_n_i of each inner node n and the output value of the hidden layer of the model, h. The sigmoid function is then applied to this product to obtain a probability between 0 and 1.

The idea of this method is to represent the output classes (i.e., the words in the vocabulary) as the leaves on the tree and to use a random walk through the tree to assign probabilities to the classes based on the path taken from the root of the tree. The probability of a certain class is then calculated as the product of the probabilities along the path from the root to the leaf node corresponding to the class.

This allows the hierarchical softmax function to compute the probability of each class more efficiently, since it only needs to consider the path through the tree rather than the entire vocabulary. This can significantly reduce the computational complexity of the model, particularly for large vocabularies, making it practical to train word embeddings on very large datasets.

Hierarchical softmax and conditional probabilities

To compute the probability of each context word given the center word and its embedding using the hierarchical softmax function, we first organize the vocabulary into a binary tree, with the words at the nodes of the tree and their descendant words arranged in a hierarchy according to their probability of occurrence.

We then compute the probability of each context word by traversing the tree from the root to the appropriate leaf node for the word. For each inner node n in the tree, we compute the probability of traversing the left or right branch of the tree as follows:

p(left|n) = sigmoid(v_n_i · h) p(right|n) = 1 – p(left|n)

where:

  • v_n_i is the vector representation of inner node n
  • h is the output value of the hidden layer of the model

The probability of a context word w is then computed as the product of the probabilities of the branches along the path from the root to the leaf node corresponding to w.

NLP – Word Embeddings – GloVe

What are word embeddings?

Word embeddings are a type of representation for text data, which allows words with similar meaning to have a similar representation in a neural network model. Word embeddings are trained such that words that are used in similar contexts will have similar vectors in the embedding space. This is useful because it allows the model to generalize better and makes it easier to learn from smaller amounts of data. Word embeddings can be trained using a variety of techniques, such as word2vec and GloVe, and are commonly used as input to deep learning models for natural language processing tasks.

So are they represented as arrays of numbers?

Yes, word embeddings are typically represented as arrays of numbers. The length of the array will depend on the size of the embedding space, which is a parameter that is chosen when the word embeddings are created. For example, if the size of the embedding space is 50, each word will be represented as a vector of length 50, with each element of the vector representing a dimension in the embedding space.

In a neural network model, these word embedding vectors are typically fed into the input layer of the model, and the rest of the layers in the model are then trained to perform some task, such as language translation or sentiment analysis. The model learns to combine the various dimensions of the word embedding vectors in order to make predictions or decisions based on the input data.

How are word embeddings determined?

There are a few different techniques for determining word embeddings, but the most common method is to use a neural network to learn the embeddings from a large dataset of text. The basic idea is to train a neural network to predict a word given the words that come before and after it in a sentence, using the output of the network as the embedding for the input word. The network is trained on a large dataset of text, and the weights of the network are used to determine the embeddings for each word.

There are a few different variations on this basic approach, such as using a different objective function or incorporating additional information into the input to the network. The specific details of how word embeddings are determined will depend on the specific method being used.

What are the specific methods for generating word embeddings?

Word embeddings are a type of representation for natural language processing tasks in which words are represented as numerical vectors in a high-dimensional space. There are several algorithms for generating word embeddings, including:

  1. Word2Vec: This algorithm uses a neural network to learn the vector representations of words. It can be trained using two different techniques: continuous bag-of-words (CBOW) and skip-gram.
  2. GloVe (Global Vectors): This algorithm learns word embeddings by factorizing a matrix of word co-occurrence statistics.
  3. FastText: This is an extension of Word2Vec that learns word embeddings for subwords (character n-grams) in addition to full words. This allows the model to better handle rare and out-of-vocabulary words.
  4. ELMo (Embeddings from Language Models): This algorithm generates word embeddings by training a deep bi-directional language model on a large dataset. The word embeddings are then derived from the hidden state of the language model.
  5. BERT (Bidirectional Encoder Representations from Transformers): This algorithm is a transformer-based language model that generates contextual word embeddings. It has achieved state-of-the-art results on a wide range of natural language processing tasks.

What is the word2vec CBOW model?

The continuous bag-of-words (CBOW) model is one of the two main techniques used to train the Word2Vec algorithm. It predicts a target word based on the context words, which are the words surrounding the target word in a text.

The CBOW model takes a window of context words as input and predicts the target word in the center of the window. The input to the model is a one-hot vector representation of the context words, and the output is a probability distribution over the words in the vocabulary. The model is trained to maximize the probability of predicting the correct target word given the context words.

During training, the model adjusts the weights of the input-to-output connections in order to minimize the prediction error. Once training is complete, the model can be used to generate word embeddings for the words in the vocabulary. These word embeddings capture the semantic relationships between words and can be used for various natural language processing tasks.

What is the word2vec skip-gram model?

The skip-gram model is the other main technique used to train the Word2Vec algorithm. It is the inverse of the continuous bag-of-words (CBOW) model, which predicts a target word based on the context words. In the skip-gram model, the target word is used to predict the context words.

Like the CBOW model, the skip-gram model takes a window of context words as input and predicts the target word in the center of the window. The input to the model is a one-hot vector representation of the target word, and the output is a probability distribution over the words in the vocabulary. The model is trained to maximize the probability of predicting the correct context words given the target word.

During training, the model adjusts the weights of the input-to-output connections in order to minimize the prediction error. Once training is complete, the model can be used to generate word embeddings for the words in the vocabulary. These word embeddings capture the semantic relationships between words and can be used for various natural language processing tasks.

What are the steps for the GloVe algorithm?

GloVe learns word embeddings by factorizing a matrix of word co-occurrence statistics, which can be calculated from a large corpus of text.

The main steps of the GloVe algorithm are as follows:

  1. Calculate the word co-occurrence matrix: Given a large corpus of text, the first step is to calculate the co-occurrence matrix, which is a symmetric matrix X where each element X_ij represents the number of times word i appears in the context of word j. The context of a word can be defined as a window of words around the word, or it can be the entire document.
  2. Initialize the word vectors: The next step is to initialize the word vectors, which are the columns of the matrix W. The word vectors are initialized with random values.
  3. Calculate the pointwise mutual information (PMI) matrix: The PMI matrix is calculated as follows:

PMI_ij = log(X_ij / (X_i * X_j))

where X_i is the sum of all the elements in the ith row of the co-occurrence matrix, and X_j is the sum of all the elements in the jth column of the co-occurrence matrix. The PMI matrix is a measure of the association between words and reflects the strength of the relationship between them.

  1. Factorize the PMI matrix: The PMI matrix is then factorized using singular value decomposition (SVD) or another matrix factorization technique to obtain the word vectors. The word vectors are the columns of the matrix W.
  2. Normalize the word vectors: Finally, the word vectors are normalized to have unit length.

Once the GloVe algorithm has been trained, the word vectors can be used to represent words in a high-dimensional space. The word vectors capture the semantic relationships between words and can be used for various natural language processing tasks.

How is the matrix factorization performed in GloVe? What is the goal?

The goal of matrix factorization in GloVe is to find two matrices, called the word matrix and the context matrix, such that the dot product of these matrices approximates the co-occurrence matrix. The word matrix contains the word vectors for each word in the vocabulary, and the context matrix contains the context vectors for each word in the vocabulary.

To find these matrices, GloVe minimizes the difference between the dot product of the word and context matrices and the co-occurrence matrix using a least-squares optimization method. This results in word vectors that capture the relationships between words in the corpus.

In GloVe, the objective function that is minimized during matrix factorization is the least-squares error between the dot product of the word and context matrices and the co-occurrence matrix. More specifically, the objective function is given by:


How is the objective function minimized?

In each iteration of SGD, a mini-batch of co-occurrence pairs (i, j) is selected from the co-occurrence matrix, and the gradients of the objective function with respect to the parameters are computed for each pair. The parameters are then updated using these gradients and a learning rate, which determines the step size of the updates.

This process is repeated until the objective function has converged to a minimum or a preset number of iterations has been reached. The process of selecting mini-batches and updating the parameters is often referred to as an epoch. SGD is an efficient method for minimizing the objective function in GloVe because it does not require computing the Hessian matrix, which is the matrix of second-order partial derivatives of the objective function.

When should GloVe be used instead of Word2Vec?

GloVe (Global Vectors) and Word2Vec are two widely used methods for learning word vectors from a large corpus of text. Both methods learn vector representations of words that capture the semantics of the words and the relationships between them, and they can be used in various natural language processing tasks, such as language modeling, information retrieval, and machine translation.

GloVe and Word2Vec differ in the way they learn word vectors. GloVe learns word vectors by factorizing a co-occurrence matrix, which is a matrix that contains information about how often words co-occur in a given corpus. Word2Vec, on the other hand, learns word vectors using a shallow neural network with a single hidden layer.

One advantage of GloVe is that it is computationally efficient, as it does not require training a neural network. This makes it well suited for use with large corpora. However, Word2Vec has been shown to perform better on some tasks, such as syntactic analogies and named entity recognition.

How is the co-occurrence matrix reduced to lower dimensions in GloVe?

In GloVe (Global Vectors), the co-occurrence matrix is not directly reduced to lower dimensions. Instead, the co-occurrence matrix is used to learn word vectors, which are then reduced to lower dimensions using dimensionality reduction techniques, such as principal component analysis (PCA) or t-distributed stochastic neighbor embedding (t-SNE).

To learn word vectors from the co-occurrence matrix in GloVe, the matrix is factorized into two matrices, called the word matrix and the context matrix, using a least-squares optimization method. The word matrix contains the word vectors for each word in the vocabulary, and the context matrix contains the context vectors for each word in the vocabulary.

After the word vectors have been learned, they can be reduced to lower dimensions using dimensionality reduction techniques. For example, PCA can be used to project the word vectors onto a lower-dimensional space, while t-SNE can be used to embed the word vectors in a two-dimensional space for visualization.

It is worth noting that reducing the dimensionality of the word vectors may result in some loss of information, as some of the relationships between words may be lost in the lower-dimensional space. Therefore, it is important to consider the trade-off between the dimensionality of the word vectors and their representational power.

Interpreting GloVe from the Ratio of Co-occurrence Probabilities

GloVe uses the ratio of co-occurrence probabilities to learn the word vectors and context vectors. Specifically, it minimizes the difference between the dot product of the word and context vectors and the log of the ratio of co-occurrence probabilities. This allows GloVe to learn word vectors that capture the meanings and relationships between words in the language.

Chatting with GPT – Neural Networks Cont’d

The information provided about neural networks has been quite useful. I would like to learn more about loss functions. Please explain loss functions in the context of a neural networks – what are some examples of loss functions? How are they used?

Loss functions are a key component of neural networks, as they help to evaluate how well a model is performing. They do this by calculating the difference between the model’s predicted output and the true target output, and then generating a scalar value that represents the model’s overall error.

There are many different loss functions that can be used in neural networks, depending on the specific task at hand. Some common examples include:

  1. Mean Squared Error (MSE) loss: This is a common loss function for regression tasks, where the goal is to predict a continuous numerical value. MSE loss is calculated by taking the average of the squared differences between the predicted output and the true output.
    MSE Loss = (1/n) * ∑(ŷ – y)^2
    A lower MSE loss indicates that the model is performing better and making more accurate predictions.
  2. Cross-Entropy Loss (CEL): This loss function is commonly used for classification tasks, where the goal is to predict a class label. CEL is calculated by taking the negative log of the predicted probability of the true class.
    CEL = – ∑(y * log(ŷ))
    A lower CEL indicates that the model is performing better and making more accurate predictions.
  3. Hinge Loss: This loss function is often used for support vector machines (SVMs) and is suitable for binary classification tasks. Hinge loss is calculated by taking the maximum of 0 and the difference between the predicted score for the true class and the predicted score for the false class, plus a margin.
    Hinge Loss = max(0, 1 – y * (w * x + b))
    where:
    y is the true class label (1 if the example belongs to the true class, -1 otherwise)
    w and b are the model’s weights and bias, respectively
    x is the input example


    The value 1 in the equation represents the margin. The margin is a hyperparameter that can be adjusted to control the influence of the loss function. A larger margin results in a more “relaxed” loss function, while a smaller margin results in a more “strict” loss function.
  4. Kullback-Leibler Divergence (KLD) Loss: This loss function is often used in generative models, where the goal is to model the distribution of the data. KLD loss is calculated by taking the difference between the predicted distribution and the true distribution.
    KLD Loss = ∑(p(x) * log(p(x)/q(x)))
    where q(x) is the predicted distribution and p(x) is the true distribution

In general, loss functions are used to guide the training process in a neural network. During training, the model’s weights and biases are adjusted in an attempt to minimize the loss function. This process is known as backpropagation, and it helps the model to learn patterns in the data and make more accurate predictions.

So once I choose a loss function for my neural network, how will it be used to estimate the weights?

Once you have chosen a loss function for your neural network, it will be used to guide the training process and estimate the weights of the model.

During training, the neural network is presented with a series of input examples, along with the corresponding true output (also known as the target output). The model processes the input and generates a predicted output. The loss function is then used to calculate the difference between the predicted output and the true output, which is known as the error.

The error is then used to update the model’s weights and biases in an attempt to minimize the loss function. This process is known as backpropagation, and it helps the model to learn patterns in the data and make more accurate predictions.

The weights and biases are updated using an optimization algorithm, such as stochastic gradient descent (SGD). During each training iteration, the optimization algorithm adjusts the weights and biases based on the error calculated by the loss function. This process is repeated until the model reaches a satisfactory level of accuracy, or until a specified number of training iterations has been reached.

Provide the steps of how the loss function and errors are used in backpropagation, include equations.

Backpropagation is a process used to train neural networks by adjusting the model’s weights and biases in an attempt to minimize the loss function. The following are the steps involved in backpropagation:

  1. Initialize the model’s weights and biases: Before training can begin, the model’s weights and biases must be initialized. This can be done using a variety of methods, such as setting all weights and biases to 0, or using random values drawn from a normal distribution.
  2. Feed forward: During each training iteration, the model is presented with a series of input examples, along with the corresponding true output (also known as the target output). The model processes the input and generates a predicted output, ŷ, using the following equation:

ŷ = f(w * x + b)

where:

  • f is the activation function
  • w and b are the model’s weights and bias, respectively
  • x is the input example
  1. Calculate the error: The loss function is then used to calculate the difference between the predicted output, ŷ, and the true output, y, which is known as the error. This can be expressed as follows:

Error = Loss(ŷ, y)

where Loss is the chosen loss function.

  1. Backpropagate the error: The error is then used to update the model’s weights and biases in an attempt to minimize the loss function. This is done using an optimization algorithm, such as stochastic gradient descent (SGD).
  2. Adjust the weights and biases: During each training iteration, the optimization algorithm adjusts the weights and biases based on the error calculated by the loss function. This can be expressed as follows:

w’ = w – α * ∂Loss/∂w b’
b’ = b – α * ∂Loss/∂b

where:

  • w’ and b’ are the updated weights and biases, respectively
  • α is the learning rate
  • ∂Loss/∂w and ∂Loss/∂b are the partial derivatives of the loss function with respect to the weights and biases, respectively
  1. Repeat until convergence: This process is repeated until the model reaches a satisfactory level of accuracy, or until a specified number of training iterations has been reached.

How do I choose the data on which the model should be trained? How do I split training vs. test data? And what is the difference between training, test, and validation data sets?

When training a neural network, it is important to choose a representative and diverse dataset to ensure that the model is able to learn patterns in the data that generalize well to new, unseen examples.

Once you have chosen a dataset, it is common practice to split it into three sets: training, validation, and test.

  • Training data is used to fit the model and learn the patterns in the data. The model is presented with a series of input examples, along with the corresponding true output (also known as the target output). The model processes the input and generates a predicted output, and the loss function is used to calculate the difference between the predicted output and the true output. This error is then used to update the model’s weights and biases in an attempt to minimize the loss function. This process is repeated until the model reaches a satisfactory level of accuracy, or until a specified number of training iterations has been reached.
  • Validation data is used to tune the model’s hyperparameters, such as the learning rate and the regularization coefficients. The model is presented with a series of input examples from the validation set, and the loss function is used to calculate the error between the predicted output and the true output. The hyperparameters are then adjusted in an attempt to minimize the validation loss.
  • Test data is used to evaluate the model’s generalization performance. The model is presented with a series of input examples from the test set, and the loss function is used to calculate the error between the predicted output and the true output. This error is used to measure the model’s accuracy on new, unseen data.

It is important to keep the training, validation, and test sets separate, as mixing them could result in a model that performs well on the training and validation sets, but poorly on the test set. This could lead to overfitting, which is when a model is overly optimized for the training and validation data, but fails to generalize well to new, unseen examples.

There are a few different strategies for splitting a dataset into training, validation, and test sets. One common approach is to use a 70/15/15 split, where 70% of the data is used for training, 15% is used for validation, and 15% is used for testing. Another approach is to use a 80/10/10 split, where 80% of the data is used for training, 10% is used for validation, and 10% is used for testing. The specific split will depend on the size and diversity of the dataset, as well as the specific requirements of the task at hand.

Can I use a neural network for both regression and classification problem types?

Yes, neural networks can be used for both regression and classification problems.

In a regression problem, the output of the model is a continuous value, such as a price or a probability. The goal is to predict the output value based on the input features. For example, you might use a neural network to predict the price of a house based on its size, location, and other features.

In a classification problem, the output of the model is a categorical value, such as a class label. The goal is to predict the class label of an input sample based on its features. For example, you might use a neural network to predict whether an email is spam or not based on the contents of the email.

Neural networks are very flexible and can be used to solve a wide range of problems. They can be used for both regression and classification, as well as other types of problems such as clustering, generation, and reinforcement learning.

What is meant by regularization in regards to a neural network model?

Regularization is a technique that is used to prevent overfitting in a neural network model. Overfitting occurs when a model is too complex and has too many parameters, which leads to poor generalization to new, unseen data.

There are several ways to regularize a neural network model, including:

  1. L1 regularization: This involves adding a penalty term to the loss function that is proportional to the absolute value of the weights. This has the effect of pushing the weights towards 0, which can help reduce overfitting.
  2. L2 regularization: This involves adding a penalty term to the loss function that is proportional to the square of the weights. This has the effect of pushing the weights towards 0, which can help reduce overfitting.
  3. Dropout: This involves randomly dropping out units (neurons) from the network during training, which can help prevent overfitting by reducing the number of parameters in the model.
  4. Early stopping: This involves stopping the training process early if the model starts to overfit, based on the performance on a validation set.

Regularization is an important technique for improving the generalization performance of a neural network model. It helps the model to generalize better to new, unseen data, rather than just memorizing the training data.

Off-topic, using NLP to summarize Russian propaganda articles on the invasion of Ukraine

I have a pretty simple article summarizer in python that I use for my blog. It’s not machine learning, it’s pretty basic – collecting N articles based on a search query, cleaning up the text, tokenizing, removing stop words. Then word frequencies are calculated and each sentence is scored based on those frequencies. Top N sentences are selected for the summary.

I usually use it to summarize articles on some specific supplements, but today for fun I decided to try a different search query. My query was “путин россия украина спецоперация”. I think that even though this summarizer is pretty simple, the summary turned out to be pretty meaningful. It gives a pretty good overview of what sort of info you would get if you were reading Russian news. I selected the top 15 sentences for the summary.

Что касается слов президента о спецоперации на Донбассе, то они показывают, что наши военнослужащие снова спасают мир от коричневой чумы нацизма, и все поставленные задачи, вне всякого сомнения, будут выполнены. Верховный главнокомандующий еще раз подтвердил, спецоперация на Украине будет доведена до своего логического завершения по тем задачам, которые он озвучил 24 февраля, в день начала СВО», — сказал полковник. И президент в своей речи указал на союзников из Европы, Азии, Африки и Латинской Америки, которые не прогибаются перед так называемым гегемоном, выбирают суверенный путь развития и хотят коллективно решать вопросы безопасности и сформировать многополярный мир.

Украинские издания утверждают, что СНБО одобрил санкции против патриарха Кирилла, а также Сергея Кириенко, Евгения Пригожина, Романа Абрамовича, Олега Дерипаски, Михаила Фридмана, Виктора Медведчука, а также Виктора Януковича. Наш долг перед памятью миллионов жертв Второй мировой войны — жестко реагировать на попытки фальсификации истории, противодействовать распространению любых форм неонацизма, русофобии и расизма», — призвал президент. Все указывает на то, что временной, человеческий, материальный, дипломатический ресурс «спецоперации» близится к исчерпанию, и Путин делает решительный шаг, чтобы закончить как можно скорее, зафиксировав прибыли и убытки. “Наша задача, наша миссия, солдат, ополченцев Донбасса – эту войну прекратить, защитить людей и, конечно, защитить саму Россию”, – подчеркнул Путин.

Выступление Путина демонстрирует готовность Вооруженных сил России довести специальную военную операцию на Украине до победного конца, пояснил URA.RU военный эксперт, ветеран разведки, полковник Анатолий Матвийчук. Война, развязанная Западом и киевской хунтой, будет закончена», — заключил Красов. Российское высокоточное оружие опережает иностранные аналоги на годы и десятилетия, при этом значительно превосходит их по тактико-техническим характеристикам, а оборонная промышленность лежит в основе суверенитета России, отметил президент.

Говоря о шагах со стороны Запада, ранее председатель комиссии Госдумы по расследованию фактов вмешательства иностранных государств во внутренние дела России Василий Пискарев заявлял о том, что зарубежные НКО насаждают радикальные идеологии в российском обществе, сообщает РАПСИ. «Сперва коллективный Запад изощрялся доказать, что якобы „разорвал в клочья“ российскую экономику, теперь там думают, что изолировали нашу страну от всего остального мира. Вооруженные силы России надежно защищают свою страну и несут свободу другим народам, заявил Путин, открывая работу форума «Армия-2022» и Армейских международных игр — 2022. «У России в запасе есть несколько военных вариантов, и для США и их союзников по НАТО эти сценарии — серьезный повод для беспокойства», — пишет журнал. Несмотря на то, что Запад был воодушевлен последними «успехами» ВСУ, генерал США Марк Милли призвал опасаться непредсказуемости России.

Yerba Mate (Ilex Paraguariensis) articles summary using NLP

The following summary was created using a google search for specific phrases and then performing natural language processing steps for sentence scoring. Yerba mate is an evergreen tree/shrub that grows in subtropical regions of South America. The leaves of the plant are used to make tea. Yerba mate tea contains caffeine and theobromine, which are known to affect the mood. I was interested in summarizing the existing articles in regards to research on this plant in psychiatry.

The first search phrase used was “yerba mate psychiatry depression research evidence“, and the number of collected articles for this phrase was 18. The text from all articles was combined, and relative word frequencies were calculated (after removing stop-words). These relative frequencies were then used to score each sentence. Sentence length distribution was checked, and the 90th percentile of 30 words was chosen to select sentences below the maximum length. Below are the 10 highest scoring sentences that summarize the text from the 18 articles.

We can infer from the summary that studies have been performed using the yerba mate extract on rats and tasks for chosen as proxies for the rats’ depression and anxiety levels. There are no mentions of human studies in the summary. Also the chosen sentences indicate that based on these studies, yerba mate has potential antidepressant activity, and it may improve memory as well. The results of the anxiety study were not mentioned and it’s not clear whether there were any side effects from yerba mate. These results are in line with descriptions of personal experiences of reddit users that I have reviewed, as many report better mood and improved focus after drinking yerba mate tea. Some users do report increased anxiety correlated with yerba mate consumption.

View abstract. J Agric.Food Chem. Vitamin C Levels Cerebral vitamin C (ascorbic acid (AA)) levels were determined as described by Jacques-Silva et al. Conclusion: In conclusion, the present study showed that Ilex paraguariensis presents an important effect on reducing immobility time on forced swimming test which could suggest an antidepressant-like effect of this extract. Despite previous some studies show the antidepressant-like activity of flavonoids [31, 32] which are present in the extract of I. paraguariensis, any study has evaluated the possible antidepressant-like activity of it. The presence of nine antioxidants compounds was investigated, namely, gallic acid, chlorogenic acid, caffeic acid, catechin, quercetin, rutin, kaempferol, caffeine, and theobromine. Abstract In this study, we investigated the possible antidepressant-like effect of I. paraguariensis in rats. Another study showed that an infusion of I. paraguariensis can improve the memory of rats treated with haloperidol and this effect was related to an indirect modulation of oxidative stress . In addition to flavonoids as quercetin and rutin and phenolic compounds as chlorogenic and caffeic acids, yerba mate is also rich in caffeine and saponins . After four weeks, behavioral analysis of locomotor activity and anxiety was evaluated in animals receiving water (n = 11) or I. paraguariensis (n = 9). In the same way, we evaluated if the presence of stimulants compounds like caffeine and theobromine in the extract of I. paraguariensis could cause anxiety. In the present study, we evaluated the possible antidepressant-like effect of I. paraguariensis by using forced swimming test (FST) in rats. Forced Swimming Test This experiment was performed using the FST according to the method previously published by Porsolt et al. In this context, Yerba mate (Ilex paraguariensis) is a beverage commonly consumed in South America especially in Argentina, Brazil, Uruguay, and Paraguay. I. paraguariensis reduced the immobility time on forced swimming test without significant changes in locomotor activity in the open field test.

I also tried several other search phrases, such as “yerba mate mood anxiety evidence” and “yerba mate side effects evidence“. In total of 17 articles were collected for the first query and 19 articles for the second query. The summaries are presented below. There was nothing in the summary directly discussing mood or anxiety, but there are mentions of neuroprotective effects and antioxidant effects. We can also learn that a cup of yerba mate tea has similar caffeine content as a cup of coffee, and that drinking yerba mate is not recommended while pregnant or breastfeeding. As in the previous summary, no human trials were mentioned, so it seems that all the summarized studies were performed on rats. The side effects query summary mentions the risk of transferring the caffeine from the tea to the fetus when pregnant, as well as a link to cancer for those who drink both alcohol and yerba mate. It also mentions and anxiety is a side effect of the tea.

Query 1:
View abstract. J Agric.Food Chem. On the other hand, studies conducted on an animal model showed chemopreventive effects of both pure mate saponin fraction and Yerba Mate tea in chemically induced colitis in rats. Yerba Mate Nutrition Facts The following nutrition information is provided by the USDA for one cup (12g) of a branded yerba mate beverage (Mate Revolution) that lists just organic yerba mate as an ingredient. Researchers found that steeping yerba mate (such as in yerba mate tea) may increase the level of absorption. Yerba mate beverages are not recommended for children and women who are pregnant or breastfeeding. Chlorogenic acid and theobromine tested individually also had neuroprotective effects, but slightly weaker than Yerba Mate extract as a whole, but stronger than known neuroprotective compounds, such as caffeine [ 83 ]. The caffeine content in a cup (about 150 mL) of Yerba Mate tea is comparable to that in a cup of coffee and is about 80 mg [ 1 , 11 , 20 ]. In aqueous and alcoholic extracts from green and roasted Yerba Mate, the presence of chlorogenic acid (caffeoylquinic acid), caffeic acid, quinic acid, dicaffeoylquinic acid, and feruloylquinic acid was confirmed. After consumption of Yerba Mate tea, antioxidant compounds are absorbed and appear in the circulating plasma where they exert antioxidant effects [ 55 ]. According to the cited studies, Yerba Mate tea consumption attenuates oxidative stress in patients with type 2 diabetes, which may prevent its complications.

Query 2:
View abstract. J Agric.Food Chem. Because yerba mate has a high concentration of caffeine, drinking mate tea while pregnant can increase the risk of transferring caffeine to the fetus. J Ethnopharmacol. South Med J 1988;81:1092-4.. View abstract. J Am Coll Nutr 2000;19:591-600.. View abstract. Am J Med 2005;118:998-1003.. View abstract. J Psychosom Res 2003;54:191-8.. View abstract. Yerba mate consumed by those who drink alcohol is linked to a higher risk of developing cancer. Anxiety and nervousness are a side effect of excessive yerba mate tea consumption.

NLP: Summarizing l-theanine articles

In this post I will describe my use of NLP (Natural language processing, not neuro-linguistic programming. Natural language processing is cool, while neuro-linguistic programming is some pseudoscience stuff) in the application of summarizing articles from the internet. Specifically, I chose the topic of l-theanine and psychiatry, as previously I have already summarized the Nootropics subreddit discussions on l-theanine. The next step, therefore, is to summarize existing articles on this topic.

Summarizing experience with green tea from the Nootropics subreddit

The first step was to perform an automated Google search for a specific term. I chose the term “l-theanine psychiatry” and set the number of unique urls to be 15. Some of the resulting urls are listed below:

Can L-Theanine Help Treat Symptoms of Bipolar Disorder?

Effects of L-Theanine Administration on Stress-Related Symptoms and Cognitive Functions in Healthy Adults: A Randomized Controlled Trial

L-theanine

How does the tea L-theanine buffer stress and anxiety

It can be seen that the article titles are quite relevant to our topic. The next step is formatting the text and summarizing the information.

The idea behind the summarization technique is calculating word frequencies for each word in the combined text of all articles (after stop words removal), and then selecting words in the top 10% of frequencies. These words will be the ones used in scoring each sentence. More frequent words will be given more importance, as they are deemed more relevant to the chosen topic, therefore sentences containing those words will receive higher scores. This is not a machine learning approach, but a basic frequency count method. In total, 148 words were used for sentence scoring. Some of the most frequent words (from all articles combined) are listed below:

Theanine, administration, effects, placebo, weeks, study, four, sleep, scores, cognitive, may, stress, function, fluency, studies, related, symptoms, participants, bacs, anxiety

BACS was one of the top frequent words, it stands for the Brief Assessment of Cognition in Schizophrenia. Once each sentence was scores, 15 highest scoring sentences were selected in order to create a summary. The summary of the articles is presented below. From the summary we can infer that l-theanine was studied for its effects on cognition, anxiety, and stress. Some studies had positive results, indicating that l-theanine performed significantly better than placebo in regards to positive cognitive effects such as improved verbal fluency and executive function. Studies also noted significant improvements in stress reduction with the use of l-theanine. Other studies did not find any significant differences between l-theanine and placebo.


Second, only about 20% of symptoms (the PSQI subscales) and cognitive functions (the BACS verbal fluency, especially letter fluency and executive function) scores showed significant changes after L- theanine administration compared to the placebo administration, suggesting that the effects are not large on daily function of the participants.

Although psychotropic effects were observed in the current study, four weeks L-theanine administration had no significant effect on cortisol or immunoglobulin A levels in the saliva or serum, which was inconsistent with previous studies reporting that salivary cortisol [34] and immunoglobulin A [33] levels were reduced after acute L-theanine administration.

Considering the comparison to the placebo administration, the current study suggests that the score for the BACS verbal fluency, especially letter fluency, but not the Trail Making Test, Stroop test, or other BACS parameters, significantly changes in response to the 4 weeks effects of L-theanine.

The BACS verbal fluency, especially letter fluency (p = 0.001), and executive function scores were significantly increased after L-theanine administration (p = 0.001 and 0.031, respectively; ), while the Trail Making Test A and B scores were significantly improved after placebo administration (p = 0.042 and 0.038, respectively).

When score reductions in the stress-related symptoms were compared between L-theanine and placebo administrations, changes in the PSQI sleep latency, sleep disturbance, and use of sleep medication subscales were significantly greater (p = 0.0499, 0.046, and 0.047, respectively), while those in the SDS and PSQI scores showed a non-statistically significant trend towards greater improvement (p = 0.084 and 0.073, respectively), during the L-theanine period compared to placebo.

Stratified analyses revealed that scores for verbal fluency (p = 0.002), especially letter fluency (p = 0.002), increased after L-theanine administration, compared to the placebo administration, in individuals who were sub-grouped into the lower half by the median split based on the mean pretreatment scores.

Discussion In this placebo-controlled study, stress-related symptoms assessed with SDS, STAI-T, and PSQI scores decreased, while BACS verbal fluency and executive function scores improved following four weeks L-theanine administration.

The present study aimed to examine the effects of four weeks L-theanine administration (200 mg/day, four weeks) in a healthy population, i.e., individuals without any major psychiatric disorder.

The PSQI subscale scores for sleep latency, sleep disturbance, and use of sleep medication reduced after L-theanine administration, compared to the placebo administration (all p < 0.05).

The effects on stress-related symptoms were broad among the symptom indices presented in the study, although a comparison to the placebo administration somewhat limits the efficacy of L-theanine administration for some sleep disturbance measurements.

For cognitive functions, BACS verbal fluency and executive function scores improved after four weeks L-theanine administration.

PMID: 31623400 This randomized, placebo-controlled, crossover, and double-blind trial aimed to examine the possible effects of four weeks L-theanine administration on stress-related symptoms and cognitive functions in healthy adults.

The anti-stress effects of L-theanine (200 mg/day) have been observed following once- [ 33 , 34 ] and twice daily [ 35 ] administration, while its attention-improving effects have been observed in response to treatment of 100 mg/day on four separate days [ 36 ] and 200 mg/day single administration [ 37 ], which was further supported by decreased responses in functional magnetic resonance imaging [ 38 ].

These results suggest that four weeks L-theanine administration has positive effects on stress-related symptoms and cognitive function in a healthy population.

Summarizing experience with green tea from the Nootropics subreddit

We can’t all get our own labs with grad assistants and grants in order to conduct research, but that doesn’t mean there aren’t other ways to obtain data. Some might say that only studies with randomized trials with test and control groups matter, but I believe that subreddits can provide supplemental information. We should look at the data with a grain of salt, but a lot of people do describe sincerely their experiences with nootropics on reddit. Users also often link studies and scientific articles in the forums.

Not all nootropics are covered by randomized studies and rarely do psychiatrists collect data on experiences with nootropics. For these reasons people mostly discuss their experiences with nootopics and supplements online, in forums such as subreddits and Facebook groups. For example, there have not been many studies on lithium orotate, but probably thousands of people are taking it at the moment. There are very few published papers on this supplement, so how could one find out about possible benefits and side effects? Personally I had a good experience with very small doses of lithium orotate helping to reduce intrusive thoughts and reviving memories from the past. Where does information about my experience exist? Only in the Nootropics and depressionregimens subreddits. No psychiatrist or doctor was ever interested in my experience with microdosing lithium, but that doesn’t mean that this information could not be useful to someone else.

There are multiple Facebook groups specifically dedicated to topics such as treatment resistant depression, PMDD, borderline personality disorder, etc. There are a lot of discussions of supplements in those groups, but unfortunately I don’t know how to obtain data from Facebook. The good thing about reddit is that Reddit offers a free API that allows you to download data from subreddits, so you can get the titles of posts, text, and comments, up to about 1000 posts per subreddit per day. Thank you, Reddit! This is really great!

For this exercise, I decided to use natural language processing to summarize text from the Nootropics subreddit, filtering for posts about green tea. I used the subreddit to filter for posts which contained keywords from the following list: green tea, theanine, ltheanine, matcha, l-theanine, and l theanine. Matcha is a type of green tea powder, therefore still green tea, and l-theanine is a water soluble amino acid found in green tea. In total there were 730 posts in my dataset, with the post created dates ranging from September 2011 to January 2022.

Examples of post titles:

  • L Theanine cured my ( social) anxiety. I’m blown away. I’m usually socially awkward but I MASTERED a job interview.
  • Comprehensive List of GABAA Receptor Anxiolytics That Potentially Produce no Tolerance or Dependence.
  • Green tea supplement ruins man’s liver
  • Neurochemical and behavioral effects of green tea (Camellia sinensis): A model study Increased serotonin and dopamine metabolism
  • Why do 1-2 cups of green tea seem to anecdotally calm so many people down in this subreddit, even though there are only trace amounts of theanine in a cup?

Once the data was collected, the title and body were combined for each post and text processing was performed. Processing included removing accented characters, expanding common contractions, removing newlines, tabs, and special characters. Python’s spellchecker library was used in order to correct spelling errors.

The first summary method used was word frequencies and sentence scoring. All posts were combined together into one document, and the frequency of each word was calculated. In total there were 10,677 unique words/terms, but not each term was meaningful, and some appeared only several times. For this reason, only the top 5% most frequent words were selected in order to be used for sentence scoring. I also assigned higher scores to words green, tea, theanine, ltheanine, and matcha, in order to capture more posts that are more likely to focus on green tea.

The combined text was separated into individual sentences, and sentences were then scored by adding up the scores of each word in the sentences. As mentioned above, the top 5% most frequent words had scores assigned above 0, with the score being equal to the frequency. The remaining words were assigned a score of 0. Some of the most frequent words included anxiety, effects, ltheanine, day, good, sleep, caffeine, depression, tea, help, work, brain, and life.

Here are the resulting top ten sentences. Some resulting sentences were quite long, so I am pasting the sentence parts most relevant to green tea.

L-Theanine: Glutamate inhibitor * Increases glycine by 17.2% for one week * Increases -1-waves within 30-45m orally * At certain dosages, can increase GABA by 19.8% * Antagonizes AMPA and Kainate * [ * Partial co-agonist for NMDA, though significantly less potent than endogenous ligands * Blocks glutamate transporters(and therefore reuptake of glutamate and glutamine) * Not sedative in regular doses but promotes relaxation * Only those who have high baseline anxiety benefit from relaxation * Nontoxic and noncarcinogenic in very high doses (4g/kg).

L-Theanine + Taurine * Anti-excitatory and sedative * Highly bioavailable and consistent * L-Theanine + Taurine + Agmatine * Anti-excitatory and sedative * Highly bioavailable and consistent * Potentiates GABAergic and can suppress NMDA better than theanine * Anti-tolerance building * L-Theanine + Rosmarinic Acid * Both are anti-glutaminergic * Potent GABAA agonist comparable to benzos * Low total formula dose * 400mg L-Theanine + 150mg RA (1875mg Rosemary extract) * Taurine + Ashwagandha * GABAA potentiation of Taurine * NMDA suppression * L-Theanine + Taurine + Ashwagandha * GABAA potentiation of Taurine * Total glutamate suppression * Taurine + Magnolia * GABAA potentiated at benzo site plus influx of GABA in body * Apigenin + Magnolia * GABAA 1 agonist plus PAM * Both very potent * Chinese Skullcap + Magnolia * GABAA 2 + 3 agonist plus PAM * Chinese Skullcap + Apigenin + Magnolia * GABAA 1 + 2 + 3 agonist plus PAM EDIT: Added GABA-T and GAD explanations EDIT 2: Found new and more accurate evidence claiming that L-Theanine is actually an NMDA partial co-agonist, not an antagonist. This backs up sources that claim to see Ca2+ activity increase and become suppressed with NMDA antagonists. It also backs up sources finding L-Theanine to be an NMDA antagonist.

HELPED SOMEWHAT, OR NOT TOO SURE Cyamemazine (anxiety), alimemazine (sleep), magnesium L-threonate and glycinate (sleep), white noise (anxiety), SuperBetter app, vitamin D3, reading about Young schemas, ginkgo biloba (energy, focus), melatonin (sleep), chamomile (sleep), verbena (sleep), lavender (sleep), ALCAR, taurine, NAC, cannabis (sleep), gratitude journal, music (anxiety), coleus forskohlii (weight loss), CLA (from safflower, weight loss), metformin (weight loss, triggered an hypoglycemia the first time I tried it), green tea, risperidone (anxiety, cravings, irritability), L-methylfolate. DID NOT SEEM TO HELP Inositol, chromium picolinate, zinc, CoQ10, apple cider vinegar, meratrim (sphaeranthus indicus + garcinia mangostana), hydroxyzine, tiapride, binaural beats.
L-theanine :** Pretty good anxiolytic, and helps a bit with focus, especially when combined with coffee. Not too sedative.
 **CBD :** When L-theanine or skullcap is not quiet enough, can add some extra anxiolysis, but nothing spectacular either, and not enough on its own.

Medication and supplement augmentation suggestions. My diagnosis is Major Depression/Generalized Anxiety. Possibly on the light end of the Borderline spectrum. I also have Restless Leg Syndrome hence some of the meds. High Cholesterol and stage 2 hypertension. Current regimen is: Bupropion 100mg Lithium Carbonate 300mg (1 in morning, 2 before bed) Gabapentin 300mg (3 times a day) Pramipexole 1mg (at bedtime) Turmeric/bioperine 2000mg x2 Omega 3 Fish Oil 3,600mg x2 Vitamin D3 5,000IU x2 Vitamin C 500mg Multivitamin L-Theanine 200mg Kratom caps (4-6 size 00 about 3 times a week with at least a day between) Tianeptine 25mg (Monday, Wednesday, Friday only) Phenibut (1 size 00 Tuesday/Thursday only).

l-Theanine, Cannabis, Glutamate/GABA and Anxiety: Could this be a potential cure for Cannabis Induced Paranoia and Anxiety? Just a thought – But could Glutamate be responsible for the anxiety and paranoia commonly felt from cannabis? This is just under informed speculation, but THC has been found to increase striatal glutamate in the brain. ( L-Theanine has been found to “block” glutamate signaling in the brain. See here; >L-theanine relieves anxiety in large part because it bears a close resemblance to the brain-signaling chemical glutamate. L-theanine produces the opposite effect in the brain. >While glutamate is the brains most important excitatory neurotransmitter, L-theanine binds to the same brain cell receptors and blocks them to glutamates effects. This action produces inhibitory effects.1,2 That inhibition to brain overactivity has a calming, relaxing effect in which anxiety fades.3 > I have always noticed that when I take L-Theanine, it helps me get higher from cannabis, all while blocking any paranoia and anxiety that I get from it. Cannabis is the only drug I have found that is potentiated by L-Theanine. With other substances, I have noticed that L-Theanine blocks a lot of the pleasurable effects, while reducing anxiety (Namely when taken with stimulants, but also with Phenibut) Since Cannabis increases glutamate in the brain, and Glutamate is associated with anxiety, and L-Theanine essentially blocks it, could L-Theanine be a good anxiety and paranoia cure for weed? Will somebody with more knowledge on this subject help me out here?

How much trouble am I in when I show this to my PsyD? Venturing outside my personal echo chamber to solicit general opinions on my supplement regime. Cognizant that I am doing it wrong, but I will really feel that I am doing it wrong when I start getting grumpy. Please don’t hate me. L-Theanine 200mg L-Carnosine 1000mg Reishi Extract 2000mg Cats Claw 1000mg Alpha-lipoic acid 500mg Ashwagandha 250mg Synapsa Bacopa Monnieri 640mg N-acetyl l-cysteine 1200mg Palmitoylethnolamide 800mg Maitake mushroom extract 3000mg Chaga 3600mg Polygala Tenuifolia 200mg Lions mane 4200mg Acetyl l-carnitine 500mg Sarcosine 2000mg Wulinshen 1000mg.

Caffeine + L-Theanine. Like the beginners guide says, Id recommend this stack for anyone looking to wet their feet with nootropics. The 1 (Caffeine):2 (L-Theanine) ratio works well for me, but in order for me to really feel the L-Theanine I need to take it on a empty stomach. My favorite dosage is 200mg Caffeine and 400mg of L-Theanine immediately before writing. It helps me to be very relaxed, not filter my thoughts, and achieve a flow state. This stack combined with music from MajesticCasual or TheSoundYouNeed YouTube channels is pretty amazing for writing. BTW, for some people 400mg of L-Theanine is too much and may make you drowsy (though not for me). L-Theanine helps reduce anxiety, but I try to make sure I meditate instead of relying on L-Theanine. I save it for when I am writing.

Please give Dosage Guidance: Kava Kava – 700 MG Capsules (This I just ordered to try, not take daily, I have never tried Kava before) – Maybe 1x a day Sulbutiamine Capsules/Tablets 200 MG – 2 Capsules once a day (400 MG 1x a Day) Uridine Monophosphate Capsules 250mg – (500-750 MG 1x a Day) Agmatine Sulfate 250mg Capsules – Maybe start with 2 Capsules 1x a day? Agmatine Sulfate 1000mg Capsules – Only for going up on higher doses. L-Theanine 200 MG – 1x a Day Mens Daily Multimineral Multivitamin Supplement – 1x a Day Vitamin D3 5,000 IU – 1x a day Vitamin B Complex – 1x a day Magnesium Glycinate 400 MG – 1x a Day Omega 3 Fish Oil 4,080mg – EPA 1200mg + DHA 900mg Capsules – 1x a Day Kratom – 4 grams – 3x a day Ashwaghanda – KSM-66, 300mg, 2x a day. Ashwagandha – Sensoril 125mg Do not know dosage? Youtheory Sleep Powder Advanced, 6 Ounce Bottle – It has full doses of a few calming amino acids and some melatonin. TLDR: I want to quit my antidepressants, and purchased a bunch of Supplements to start taking while weaning off my meds, please give me help/tips on this journey, as well as check out the list and let me know if you recommend other supplements to try, or any tips on how to take these.

L-theanine has done wonders for me sleep, anxiety and productivity. With L-t I have had much better sleep due to increased relaxation and reduces anxiety. This has lead to much better and longer sleep, making me really productive at work. It is also helping a lot with anxiety from coffee, it is all but gone now. I just get the nice energy boost and focus boost with no anxiety effect. I usually take 1 or 2 pills of 225mg depending on how i feel. If I feel chill enough, I will only have 1 at night. If I feel the anxiety and neck tightness coming on from coffee I will take another one then.

Supplementation guide to stimulants. As I have some extensive experience with ADHD medication and stims (ADHD-PI diagnosed myself), over the years through research and trial and error I have built a list of supplements that works for mitigating side effects and minimizing comedown while enhancing their intended effects. I read a post about this a couple years ago and wanted to add my own twist to it in hopes of promoting harm reduction. The supplement + stim dosages here given are intended to be used for studying/productivity purposes, although this will still work if you are taking more than the therapeutic amount. If you have any inputs, advice or additions to the list I am happy to add them. Stimulants used for the purposes of studying SHOULD NOT be taken everyday to avoid dopaminergic downregulation. Three times a week at most is recommended to allow the body to regain homeostasis. Stimulants that these supplements can work for include: * Amphetamines (Adderall, Dexamphetamine, Methamphetamine) * Methylphenidates and Analogues (Focalin, Concerta/Ritalin, ethylphenidate, Isopropylphenidate) * Caffeine (Coffee, Tea, Caffeine Pills) (To a certain degree) * Eugeroics (Modafinil, Adrafinil, Armodafinil) (To a certain degree).
*L-Theanine (200mg/1-3x)\\***  (Reduces euphoria/ Reduces Jitters / Lowers Anxiety / Relaxation) (Anecdotal : Amazing supplement if you are an anxiety sensitive person, smooths out the experience) >[Effects of L-theanine on stress related symptoms and cognitive functions in healthy adults].

I think that given then simple method that was used to select these top sentences, the results can be viewed as pretty successful. No neural networks were applied here, only word frequencies were used to generate sentence scores, but by reading the results we can actually learn a lot about green tea as a nootropic. My first observation would be that people mostly talk about l-theanine and not green tea. This makes sense, since the nootropics subreddit is mostly about discussions on supplements in pill form. Another observation is that people try l-theanine hoping to reduce anxiety and improve sleep. Information was provided stating that l-theanine could be reducing anxiety by inhibiting glutamate, an excitatory neurotransmitter. One user mentioned that l-theanine helps them with THC induced paranoia and proposed that THC increases glutamate in the brain and l-theanine in turn decreases anxiety by reducing available glutamate. Other users also mention l-theanine helping them with the anxiety and jitteriness after drinking coffee. In terms of side-effects that were mentioned, sedation and drowsiness were some of them.

In conclusion, I was able to extract a pretty good summary of green tea/l-theanine by using a pretty simple word frequency method. Given that now I have this code, I can just change the supplement keywords and create a similar summary for any other supplement. It’s definitely much faster than scrolling through the subreddit, looking for relevant posts.

Reddit Depression Regimens – Topic Modeling cont’d

In the previous posts we applied LDA topic modeling to text documents from data collected from the subreddit depressionregimens. Here I will continue with the results from the derived topics model – obtaining the most representative text for each topic. As was stated, the chosen model has ten topics, and LDA assumes that each document is composed of multiple topics, with each topic being assigned a probability. Each topic is composed of multiple words, with each word assigned a probability.

Previous post: Reddit Depression Regiments – Topic Modeling

Since each document is composed of multiple topics, for each topic we can find a document with the highest probability for that topic, therefore that will be our most representative document.

Topic 1

(‘feel’, 0.040), (‘year’, 0.026), (‘thing’, 0.022), (‘symptom’, 0.020), (‘brain’, 0.019), (‘start’, 0.018), (‘time’, 0.017), (‘make’, 0.015), (‘issue’, 0.015), (‘lot’, 0.014)

Most representative post id with topic 1 probability of 0.45:
Full text here: https://www.reddit.com/r/depressionregimens/comments/gib17h

“Blank Mind Syndrome” – Sub group of specific symptoms including: – Loss of Internal Monologue, lack of coherent automatic thoughts, no track of time passage, lack of self insight – Depersonalisation/Derealization Feeling detached, having no “sense of self”, missing mental features, having no emotional autobiography, feeling as if every day is the same, loss of relationship or emotional attachments, feeling detached from external reality – Cognitive Decline, Loss of Visual imagination, inability to think in a deep or complex way, inability to hold information, loss of past learned skills and knowledge. – Complete Lack of goal-directed motivation, having no automatic self direction, no long term goals – Anhedonia – inability to enjoy or derive pleasure, nothing to look forward to, no bodily joy, satasfaction and so on – Lack of atmosphere/deepness of the outside reality, inability to appreciate beauty, things look flat and insignificant. All symptoms in various severity of course, It’s time to talk, what is this condition exactly, Did you suffer from depression your entire life? Is this episodic? how are you planning to solve it? how did you acquire it? had any professional been aware of it? Is it medication induced? Is there any outside outlet acknowledging this specific phenomena? How much time do you suffer from it? What were you diagnosed with? Was it sudden or progressively? Had anything helped at all? Would you join a group for people suffering the same condition? Is anyone interested in doing so? Please do respond!

Topic 2

people 0.044, depression 0.037, doctor 0.028, psychiatrist 0.020, make 0.020, bad 0.016, therapy 0.016, therapist 0.015, find 0.014, problem 0.013

Most representative post for this topic, with probability for topic 2 of 0.53: https://www.reddit.com/r/depressionregimens/comments/iij4tr

I talked to him today, he says all my problems are my choice and I choose to be lazy, suicidal, depressed etc. Is he right?,Dude… if he believes this then he must also believe that his career is total quackery. Get a new psychiatrist immediately. What a piece of shit.,absolutely not, please get a new psychiatrist!! you don’t choose to be suicidal or depressed, and in my experience, depression causes laziness more often than not. it’s worrisome that a professional outright said this to you and honestly I would report him if you can. that’s such a harmful thing to say to anyone suffering from such issues and to say it to the wrong person could be really catastrophic. i’m sorry he’s a dick to you, don’t listen to his bullshit. if it was so easy as to choose not to be depressed then nobody would fucking be depressed. it’s like he thinks people enjoy feeling this way ugh,OMG please please PLEASE never go back there. I once had a psychiatrist tell me I was gonna end up on a street corner with a sign (spoiler alert: I have a career and own a house). I got up and left and never looked back. Remember that YOU are a huge part of your mental health journey. It’s a collaborative effort between you, your psychiatrist, therapist (if you have one), and any other professional you choose to involve. You always have a say, and if something doesn’t seem right, you don’t have to go along with it. Your feelings are ALWAYS valid—don’t ever let anyone tell you differently. You are not alone in this. So many of us are depressed, anxious, suicidal, attention deficit, bipolar, lazy…these are NOT choices. Who would choose to be this way? There are plenty of helpful professionals out there, just make sure you screen them carefully. I believe in you and wish you well!!! …

Topic 3

day 0.037, thing 0.035, feel 0.033, make 0.024, find 0.017, good 0.016, exercise 0.016, eat 0.013, walk 0.013, lot 0.013

https://www.reddit.com/r/depressionregimens/comments/dztdw9

Topic probability: 0.53

Wanted to share something that I’ve recently found to help when I’m struggling to find motivation to complete basic chores. This one specifically deals with laundry, but it can apply to other tasks as well. If you’re like me, you can have laundry sitting there for weeks not being put away. The mountain of clothing is so overwhelming that I just ignore it all together. I’m also an all-or-nothing person; I just wait and wait until a good day when I’ll have enough energy to get it done. Those days are exceedingly rare, so that mountain of clothes will sit there for a loooong time, stressing me out and depressing me even more. I’m trying to switch my mindset to not feeling like I need to take on such giant tasks all at once. I decided to break up the tasks into smaller ones. For the mixed load of laundry that needed to be put away, I told myself I only need to put away the socks and underwear today. Then tomorrow I put away the shirts. The next day, fold pants, and the next everything else that goes on hangers. These smaller tasks only take like 5-10 minutes each, and it’s satisfying to see the pile of clothes dwindle every day versus sit there ominously for several weeks. If you’re feeling overwhelmed, break up your tasks into very small, easily attainable goals. Go easy on yourself and do what you can for the day. Even just the tiniest amount of progress is a good thing.,great advice. ​ Anytime you get anxiety over a task or a situation seems to complex or overwhelming. Just break in down into manageable pieces. Doing SOMETHING is always better than nothing even if it seems like too little or not enough or w/e.,I saw a meme about ‘anything worth doing is worth doing badly’ that addresses this. I try and remember that some days. Us perfectionists want to always do 100%. But in a lot of things (not everything, obviously, just as a general rule) doing 50% of the job, or 90% of the job, is way better then the 0% of the job we do because of that crippling dedication to doing 100%. Not an excuse for doing bad jobs on the stuff that really matters, but can be a much healthier way to approach doing general day-to-day stuff…

Topic 4

ssris 0.027, antidepressant 0.024, effect 0.024, drug 0.022, side_effect 0.020, depression 0.019, serotonin 0.016, prescribe 0.014, treat 0.013, ssri 0.012

Reddit post: https://www.reddit.com/r/depressionregimens/comments/bheg7d

Topic probability: 0.64

Hey y’all, this is a repost of the stickied post made by /u/jugglerofworlds, who appears to have deleted their account and their post along with it. I’ve edited it a little and will continue to keep it updated as needed. Suggestions are welcome. As the former post was, I’m trying to keep this confined to prescription medications, and not natural/herbal remedies (though I recognize that they definitely can be helpful means of treatment). I’m also typically avoiding medications that have been withdrawn from the market and thus aren’t really prescribed. In a future revision of this post I hope to add an additional column featuring which medications are available where, as some of these are approved in European countries but not in the U.S., and vice versa. # Icon key * ✔️ = approved to treat condition by a regulatory agency (FDA, EMA, ANSM, etc) * ➕ = approved as an adjunct treatment by a regulatory agency, to be used in combination with other medications to treat a condition (may or may not be used off-label as a monotherapy) * 🏷️ = Off label use; widely prescribed for condition but not necessarily rigorously studied for it * ⚠️ = experimental medication; in FDA Phase III trials or pending approval # Selective Serotonin Reuptake Inhibitors (SSRIs) |Generic name|Brand name(s)|Treats depression|Treats anxiety| |:-|:-|:-|:-| |citalopram|Celexa|✔️|🏷️| |escitalopram|Lexapro|✔️|✔️| |fluoxetine|Prozac|✔️|✔️| |fluvoxamine|Luvox/Luvox CR|✔️|✔️| |paroxetine|Paxil/Paxil CR|✔️|✔️| |sertraline|Zoloft|✔️|✔️| # Serotonin Modulator and Stimulators (SMS) |Generic name|Brand name(s)|Treats depression|Treats anxiety| |:-|:-|:-|:-| |vortioxetine|Trintellix|✔️|🏷️| |vilazodone|Viibryd|✔️|🏷️| # Serotonin-Norepinephrine Reuptake Inhibitors (SNRIs) |Generic name|Brand name(s)|Treats depression|Treats anxiety| |:-|:-|:-|:-| |venlafaxine|Effexor/Effexor XR|✔️|✔️| |desvenlafaxine|Pristiq|✔️|🏷️| |duloxetine|Cymbalta|✔️|✔️| |milnacipran|Savella|✔️|✔️| |levomilnacipran|Fetzima|✔️|🏷️| |atomoxetine|Strattera|⚠️|⚠️| # Tricyclics (TCAs) ## TCAs with a preference for serotonin |Generic name|Brand name(s)|Treats depression|Treats anxiety|…

Topic 5

treatment 0.035, ketamine 0.028, year 0.022, work 0.021, drug 0.017, hope 0.015, hear 0.012, lithium 0.011, people 0.010, infusion 0.009

Reddit post: https://www.reddit.com/r/depressionregimens/comments/axtnj8

Topic probability: 0.58

https://www.washingtonpost.com/health/2019/03/06/biggest-advance-depression-years-fda-approves-novel-treatment-hardest-cases ​ The Food and Drug Administration approved a novel antidepressant late Tuesday for people with depression that does not respond to other treatments — the first in decades to work in a completely new way in the brain. The drug, a nasal spray called esketamine, has been eagerly anticipated by psychiatrists and patient groups as a powerful new tool to fight intractable depression. The spray acts within hours, rather than weeks or months as is typical for current antidepressants, and could offer a lifeline to about 5 million people in the United States with major depressive disorder who haven’t been helped by current treatments. That accounts for about one in three people with depression. “This is undeniably a major advance,” said Jeffrey Lieberman, a Columbia University psychiatrist. But he cautioned much is still unknown about the drug, particularly regarding its long-term use. “Doctors will have to be very judicious and feel their way along,” he said. The label for the drug will carry a black box warning – the most serious safety warning issued by the FDA. It will caution users they could experience sedation and problems with attention, judgment and thinking, and that there’s potential for abuse and suicidal thoughts. People who take esketamine will have to be monitored for at least two hours after receiving a dose to guard against some of these side effects…

Topic 6

work 0.053, anxiety 0.030, mg 0.025, bad 0.020, high 0.020, vitamin 0.018, diet 0.015, supplement 0.014, post 0.012, literally 0.011

Reddit post: https://www.reddit.com/r/depressionregimens/comments/alh4r3

Topic probability: 0.52

About 3 or 4 years ago, I developed a severe form of anxiety disorder where it manifested in panic attacks characterized by intense bouts of nausea, gagging, and retching. It didn’t usually get bad enough to get to vomiting, though it did in a few instances (in which I went to the hospital afterwards). My body responds to stress naturally by gagging and nausea. So imagine being anxious all the time but also NAUSEOUS 24/7, and I mean literally 24/7 without any respite. At times I was seriously considering suicide because of how bad I felt all the time every day. The whole thing started I think because I had to present at a large conference with thousands of people in attendance, and I had a very bad experience being insulted by some people at a previous iteration of this conference years ago. I was commuting to work one day (before the conference) and suddenly got this massive bout of nausea where I felt like I was dying. I realized however that this was my body telling me I have stagefright. I expected my nausea to evaporate once I finished speaking, as it usually would have in the past. Except that it didn’t. It stayed, and remained with me for years. I tried everything but avoided antidepressants for the longest time due to the bad rep they get. I tried the following medications: * Ginger – in various forms – for nausea (didn’t work) * Peppermint – in various forms – for nausea (didn’t work) * Ondansetron (zofran) – 4 mg; as needed – for nausea (didn’t work) * Chlordiazepoxide/clidinium bromide (librax) – 5 mg; once daily – for nausea and anxiety (didn’t work) * Pyridoxine/doxylamine (diclectin) – 10 mg pyridoxine, 10 mg doxylamine; 2 tablets at bedtime – for nausea (didn’t work) * Metoclopramide – 1 tablet daily – for nausea (didn’t work) * Domperidone – 10 mg; once daily – for nausea (didn’t work) * Propranolol – 10 mg; twice daily – for anxiety (didn’t work) * Prochlorazapine – 10 mg; twice daily – for nausea (didn’t work) * Lorazepam (Ativan) – 1 mg; 1 tablet at bedtime – for anxiety (didn’t work; just made me really sleepy) * Pantoprazole (Tecta) – 1 tablet daily – for nausea (didn’t work) * Dimenhydrinate (Gravol) – 1 tablet as needed – for nausea (didn’t work) * Nabilone (cesamet) – 0.5 mg as needed – for nausea (worked for nausea but not anxiety, and gave me a really uncomfortable high) * Clomipramine (Anafranil) – 10 mg. once daily – for anxiety (didn’t try properly due to side-effects) I was afraid even of getting out of my own house. I was afraid of meeting people. I was afraid of leaving my own room – the only place where I felt somewhat at ease and the nausea wasn’t THAT bad. The only thing that worked somewhat to relieve the nausea was chewing on things, whether that meant food at mealtimes, or fennel seeds, or sucking on mints/cough drops. So I carried mints and fennel seeds with me at all times no matter where I was – including in the washroom in my own house and even when I wanted to take a shower I had to have them nearby otherwise I would literally throw up in the shower. But these were not long-term cures to my problem and only a short alleviation of the symptoms (and not that effective if I was more anxious than usual). I somehow graduated from university with a degree in neuroscience and fought through this nausea-anxiety for 2 years doing so. My graduation ceremony – which was supposed to be a happy occasion – was marred by constant nausea and me going through at least 3 entire tins of mints because my body handles excitedness the same way as it does for anxiety. Literally nothing was working and I was at my wit’s end. So I went downtown Toronto and bought CBD oil from a dispensary. I only did this because I was literally desperate, even though I had never done any recreational drugs in my life upto that point (except caffeine), and even though I had a horrible experience with nabilone (synthetic THC for cancer patients to reduce their nausea) so I was really kind of anxious about even using that. But it worked…

Reddit Depression Regimens – Topic Modeling

Text data (top posts and top comments for those posts) was downloaded from the subreddit depression regimens (https://www.reddit.com/r/depressionregimens/). Data was grouped by post id, in total there were 101 such ids, therefore 101 text documents. After collecting the data, the following data cleaning steps were performed:

  • any emails were removed from text
  • urls were removed (http and www)
  • common contractions were expanded (‘ain’t >> ‘is not’; ‘bday’ >> ‘birthday’; ‘don’t’ >> ‘do not’; etc.)
  • new line characters were removed
  • single quotes were removed

After the data cleaning steps were complete, sentences were tokenized into words, and punctuation was removed. English stop words were removed from documents. Python’s gensim.models.phrases.Phraser() was used in order to detect common phrases (bigrams). Lemmatization was then performed and part of speech tagging (POS). Only lemmatized words with certain POS tags were kept, including nouns, adjectives, verbs, adverbs, and proper nouns. Proper nouns were kept in case medication or supplement names get tagged as such. We are interested in how reddit users describe their experiences with certain psychotropic medications and supplements, and therefore the chosen POS tags are the ones that are relevant for descriptions.

An example of an original post is presented below. As a human, we can decipher that the post below is about ketamine and that the user had a positive experience with this treatment. We would be interested in the verbs, adverbs, adjectives, and nouns, that the author used to describe their experience.


I wanted to post this to give hope to those who need a little extra. I know how excruciating both having and battling treatment options for depression and anxiety can be. I’ve seen what I thought was rock bottom. I’ve been to my actual rock bottom, but I am free now.

One year ago, I was sitting in the recliner at my local ketamine clinic receiving my first infusion. The day before I had outlined my plan for suicide and had all my affairs in order, but a friend I had confided in about my depression had a “feeling” I was in a very dangerous place inside my head. I don’t know how she knew what I was planning, but thank goodness she had the foresight and the strength to push me to try one. more. thing. I had heard (and shared) quite a few podcasts from people who had been through treatment, administered the treatments, and even those who had been doing the research behind it all. had been chatting about ketamine as a potential treatment since nothing else had ever worked. She researched clinics, called them all to pick the best one, and made the appointment on an urgent basis getting me in that day.

She took me to the doctor, and after a while, I told him about my plan. I told him that I would give this a try, but this was my last try. After 25 years of my brain being a lab rat for every pill imaginable, years and years of therapy and everything else you can imagine, I was just so tired. He was sympathetic, caring and sat with me for quite a while. Then he started that first IV.

I won’t bore you with all the details of treatment (feel free to ask), but I can say that after the first treatment – one year ago today – I didn’t want to die anymore. I haven’t wanted to since. From time to time, the depression will creep in a little too much for comfort, but I have a lot of self-care tricks to help me get through it. And if It comes down to it, I go in for a booster treatment.
In the past year, I have had 11 infusions. The last 3 were to help me get off the last, and most difficult antidepressant that I ever took. Now, I’m on a very low dose of Lexapro, which I honestly doubt I even need. But I’m stable. I actually know what happiness feels like. And most importantly, I’m alive.
Thanks for reading.

After we perform the steps described above – data cleaning, removing stop words, lemmatization, and keeping only terms with specific POS tags, extract common bigrams, the post above results in the following:

want post give hope need little extra know excruciating battle treatment option depression anxiety see think rock bottom actual rock bottom free year ago sit recliner local ketamine clinic receive first infusion day outline plan suicide affair order friend confide depression feel dangerous place head know know plan thank goodness foresight strength push try thing hear share quite podcast people treatment administered treatment even research chat ketamine potential treatment else ever work research clinic call pick good make appointment urgent basis get day take doctor tell plan tell would give try last try year brain lab rat pill imaginable year year therapy else imagine tired sympathetic caring sit quite start first bear detail treatment feel free ask say first treatment year ago today want die anymore want time time depression creep little much comfort lot self_care trick help come go booster treatment year infusion last help last difficult antidepressant ever take low_dose lexapro honestly doubt even need stable actually know happiness feel importantly alive thank read

In regards to topic modeling, we are interested in the general topics that are discussed in this particular subreddit, Latent Dirichlet Allocation (LDA) can be used specifically for this sort of task. LDA is an unsupervised method for finding topics in text data. Our text is composed of documents, in this case each document is a combination of a post and top comments for a specific post id. LDA assumes that each document is composed of different topics and each topic is composed of different words. Therefore documents can contain overlapping topics and topics can contain overlapping words, but the probabilities for those topics and words will be different.

Since the problem is unsupervised, there are no labels, just text, and we don’t know how many topics there are in our subreddit. There is no exact formula to determine the optimal number of topics for the LDA model. One common way, that we will implement here, is to loop through different number of topics and calculate coherence scores. Then we should choose the model with the highest coherence score. In this specific case, I implemented models for 2, 4, 8, …, 14 topics, and plotted the corresponding coherence scores. As we can see from the chart, the highest value occurs when the number of topics is four, also we see peaks at 10 and 12 topics.

Now let’s see what the topics are.

Number of topics = 4

Topic 1 wordsWord probTopic 2 wordsWord probTopic 3 wordsWord probTopic 4 wordsWord prob
feel0.037day0.042depression0.017depression0.019
thing0.026work0.024effect0.014people0.017
depression0.022feel0.020antidepressant0.013year0.016
make0.021time0.020ssris0.012give0.016
life0.019sleep0.018doctor0.012treatment0.015
time0.013good0.014anxiety0.010month0.013
bad0.013thing0.012side_effect0.010start0.013
good0.012start0.010mg0.010find0.012
lot0.010bed0.009drug0.009hope0.012
depressed0.010exercise0.009psychiatrist0.009ketamine0.011

If we choose 10 topics:

Topic 1 wordsWord probTopic 2 wordsWord probTopic 3 wordsWord probTopic 4 wordsWord probTopic 5 wordsWord prob
feel0.039people0.044day0.037ssris0.027treatment0.035
year0.026depression0.037thing0.035antidepressant0.024ketamine0.028
thing0.022doctor0.028feel0.033effect0.024year0.022
symptom0.020psychiatrist0.020make0.024drug0.022work0.021
brain0.019make0.020find0.017side_effect0.020drug0.017
start0.018bad0.016good0.016depression0.019hope0.015
time0.017therapy0.016exercise0.016serotonin0.016hear0.012
make0.015therapist0.015eat0.013prescribe0.014lithium0.011
issue0.015find0.014walk0.013treat0.013people0.010
lot0.014problem0.013lot0.013ssri0.012infusion0.009
Topic 6 wordsWord probTopic 7 wordsWord probTopic 8 wordsWord probTopic 9 wordsWord probTopic 10 wordsWord prob
work0.053time0.033sleep0.053experience0.039life0.062
anxiety0.030make0.028day0.037day0.030feel0.030
mg0.025depression0.015time0.030feel0.029depression0.029
bad0.020long0.015bed0.024depression0.024thing0.020
high0.020call0.014start0.024mind0.020find0.019
vitamin0.018depressed0.014feel0.023give0.017good0.017
diet0.015feeling0.013morning0.020month0.017live0.017
supplement0.014people0.013wake0.018good0.015bad0.014
post0.012read0.013night0.014week0.013change0.014
literally0.011focus0.013hour0.013back0.012year0.013

I think that even with this small sample size – 101 top posts and corresponding top comments, LDA results provide us with a good understanding of what users discuss in the depressionregimens subreddit. There are discussions about life, feeling depressed, for how long the depression has been going on (mentions of week/month/year), mentions of how the day goes (Topic 7), mentions of specific treatments (Topic 4), supplements (Topic 5), SSRIs and side effects (Topic 3), exercise (Topic 2).

It’s possible to then apply the chosen model to each document in order to obtain the topics distribution by document. For example, we can choose the model with 10 topics, obtain topics distribution by document, and determine the topic with the maximum probability for each document. Then we can select sample documents that have the highest probability for a given topic. If we choose topic 2, which contains the following word distribution:

(‘people’, 0.04), (‘depression’, 0.038), (‘doctor’, 0.028), (‘psychiatrist’, 0.020),
(‘make’, 0.020), (‘bad’, 0.016), (‘therapy’, 0.016), (‘therapist’, 0.015), (‘find’, 0.014),
(‘problem’, 0.013)

We can find documents that have the maximum probability for topic 2:


“This might be an unconventional treatment considering that many of us post about their experience with various drugs.
I myself struggled with mental health in the past. I can say my mental health issues in the past were to 90% biological (hormonal problems). Once I treated the causes, over time the upwards spiral in my personal wellbeing (and life in general) started again.
In early twenties, my life was starting to go down the gutter. My life started to fall apart in every domain. I was severely depressed. I found out that some of my hormones were very low. I started hormone replacement. Whereas before my life was a nightmare, it has been great ever since. I could even get off the SSRIs I was on.
I wrote an article about my journey. How Hormones Destroyed and Saved My Life.
My dream is to live in a world where no one is held back from living an at least decent life the way I was. Even though not my fault, it is my life. And thus my responsibility. Without accepting and acting on that I just don´t know where I would be today. For sure I wouldn´t be writing this. Hope you find value in it… “

(Can read full text at https://www.reddit.com/r/depressionregimens/comments/lef32x )

The topics distribution for this document is as follows:
[1: 0.074, 2: 0.338, 3: 0.032, 4: 0.069, 5: 0.083, 6: 0.084, 7: 0.052, 8: 0.054, 9: 0.073, 10: 0.153]

Lemmatized text:

[‘may’, ‘unconventional’, ‘treatment’, ‘consider’, ‘many’, ‘us’, ‘post’, ‘experience’, ‘various’, ‘drug’, ‘struggle’, [‘may’, ‘unconventional’, ‘treatment’, ‘consider’, ‘many’, ‘us’, ‘post’, ‘experience’, ‘various’, ‘drug’, ‘struggle’, ‘mental_health’, ‘say’, ‘mental_health’, ‘issue’, ‘biological’, ‘hormonal’, ‘problem‘, ‘treat’, ’cause’, ‘time’, ‘upwards’, ‘spiral’, ‘personal’, ‘wellbeing’, ‘life’, ‘general’, ‘start’, ‘early’, ‘twenty’, ‘life’, ‘start’, ‘go’, ‘gutter’, ‘life’, ‘start’, ‘fall’, ‘apart’, ‘domain’, ‘severely_depresse’, ‘find‘, ‘hormone’, ‘low’, ‘start’, ‘hormone’, ‘replacement’, ‘life’, ‘nightmare’, ‘great’, ‘ever’, ‘since’, ‘could’, ‘even’, ‘ssris’, ‘write’, ‘article’, ‘journey’, ‘hormone’, ‘destroy’, ‘save’, ‘life’, ‘dream’, ‘live’, ‘world’, ‘hold’, ‘back’, ‘live’, ‘least’, ‘decent’, ‘life’, ‘way’, ‘even’, ‘fault’, ‘life’, ‘thus’, ‘responsibility’, ‘accept’, ‘act’, ‘know’, ‘would’, ‘today’, ‘sure’, ‘write’, ‘hope’, ‘find‘, ‘value’, ‘opinion’, ‘replace’, ‘hormone’, ‘deficient’, ‘far’, ‘natural’, ‘also’, ‘effective’, ‘artificial’, ‘med’, ‘however’, ‘believe’, ‘hormone’, ‘deficiency’, ‘may’, ‘much’, ‘common’, ‘assume’, ‘people‘, ‘never’, ‘get’, ‘hormone’, ‘check’, ‘often’, ‘even’, ‘life’, ‘want’, ‘give’, ‘head’, ‘other’, ‘pull’, ‘trigger’, ‘medication’, ‘claim’, ‘medication’, ‘work’, ‘hormone’, ‘check’, ‘opportunity’, ‘cost’, ‘high’, ‘similar’, ‘experience’, ‘hormone’, ‘hormone’, ‘dangerous’, ‘play’, ‘make‘, ‘sure’, ‘talk’, ‘doctor‘, ‘monitor’, ‘doctor‘, ‘lock’, ‘post’, ‘people‘, ‘would’, ‘see’, ‘unlocked’, ‘pm’, ‘otherwise’, ‘leave’, ‘lock’, ‘play’, ‘hormone’, ‘medical’, ‘supervision’, ‘highly’, ‘detrimental’, ‘health’, ‘thyroid’, ‘hormone’, ‘deficient’, ‘know’, ‘other’, ‘start’, ‘take’, ‘mcg’, ‘thyroxine’, ‘treat’, ‘hypothyroidism’, ‘run’, ‘family’, ‘fog’, ‘seem’, ‘lift’, ‘agree’, ‘hormone’, ‘underrated’, ‘come’, ‘depression‘, ‘thank’, ‘share’, ‘hormone’, ‘specifically’, ‘testosterone’, ‘direct’, ‘correlation’, ‘dopamine’, ‘high’, ‘test’, ‘high’, ‘dopamine’, ‘vice’, ‘versa’, ‘generally’, ‘testerone’, ‘wellbutrin’, ‘increase’, ‘libido’, ‘endocrine’, ‘system’, ‘research’, ‘seem’, ‘lag’, ‘research’, ‘treatment’, ‘know’, ‘million’, ‘could’, ‘suffer’, ‘needlessly’, ‘ignore’, ‘op’, ‘entire’, ‘post’, ‘structure’, ‘sway’, ‘people‘, ‘way’, ‘link’, ‘closing’, ‘paragraph’, ‘also’, ‘spamme’, ‘numerous’, ‘time’, ‘different’, ‘thing’, ’cause’, ‘depression‘, ‘know’, ‘enough’, ‘dark’, ‘age’, ‘exclusive’, ‘seratonin’, ‘hormone’, ‘receptor’, ‘regulation’, ‘drug’, ‘abuse’, ‘dopamine’, ‘ach’, ‘brain’, ‘damage’, ‘gaba’, ‘glutamate’, ‘imbalance’, ‘relate’, ‘several’, ‘brain’, ‘region’, ‘receptor’, ‘site’, ‘together’, ‘hormone’, ‘conversion’, ‘chain’, ‘adhd’, ‘bp’, ‘level’, ‘bdnf’, ‘several’, ‘type’, ‘disease’, ‘additionally’, ‘low’, ‘end’, ‘hormone’, ‘scale’, ‘total’, ‘free’, ‘may’, ‘feel’, ‘symptom’, ‘other’, ‘would’, ‘conversely’, ‘man’, ‘may’, ‘almost’, ‘nil’, ‘estrogen’, ‘high’, ‘estrogen’, ‘side_effect’, ‘decent’, ‘doctor‘, ‘full’, ‘blood’, ‘panel’, ‘hormone’, ‘panel’, ‘include’, ‘ask’, ‘depend’, ‘free’, ‘go’, ‘private’, ‘cost’, ‘uk’, ‘take’, ‘important’, ‘relative’, ‘commit’, ‘find‘, ‘thyroid’, ‘level’, ‘way’, ‘back’, ‘thyroid’, ‘problem‘, ‘handle’, ‘psych’, ‘med’, ‘need’, ‘depression‘, ‘probably’, ‘lifelong’, ‘become’, ‘unmanageable’, ‘thyroid’, ‘cancer’, ‘luckily’, ‘old’, ‘easy’, ‘catch’, ‘get’, ‘point’, ‘hormone’, ‘low’, ‘find‘, ‘hormone’, ‘check’, ‘yearly’, ‘perfectly’, ‘normal’, ‘even’, ‘high’, ‘yet’, ‘still’, ‘depressed’, ‘hormone’, ‘may’, ‘help’, ‘people‘, ‘many’, ‘still’, ‘depress’, ‘physiological’, ‘duck’, ‘row’, ‘infuriate’, ‘many’, ‘doctor‘, ‘refuse’, ‘prescribe’, ‘hrt’, ‘guess’, ‘taboo’, ‘medical’, ‘school’, ‘pull’, ‘tooth’, ‘find‘, ‘decent’, ‘doctor‘, ‘even’, ‘consider’, ‘apparently’, ‘fear’, ‘cancer’, ‘induce’, ‘hormone’, ‘frankly’, ‘rather’, ‘live’, ‘good’, ‘life’, ‘even’, ‘mean’, ‘get’, ‘cancer’, ‘live’, ‘cancer’, ‘free’, ‘life’, ‘mentally’, ‘miserable’, ‘post’, ‘multiple’, ‘account’, ‘whole’, ‘time’, ‘person’, ‘post’, ‘often’, ‘article’, ‘different’, ‘account’, ‘sometimes’, ‘claim’, ‘last’, ‘year’, ‘biology’, ‘student’, ‘other’, ‘last’, ‘year’, ‘medicine’, ‘student’, ‘post’, ‘lame’, ‘excuse’, ‘lure’, ‘costumer’, ‘hormetheu’, ‘thank’, ‘share’, ‘disregard’, ‘irrational’, ‘post’, ‘intelligent’, ‘enough’, ‘determine’, ‘right’, ‘see’, ‘sort’, ‘ground’, ‘swell’, ‘business’, ‘activity’, ‘even’, ‘touch’, ‘consultation’, ‘hormone’, ‘way’, ‘business’, ‘s’, ‘even’, ‘well’, ‘talk’, ‘get’, ‘free’, ‘professsional’, ‘guidance’, ‘think’, ‘people‘, ‘stick’, ‘depression‘, ’cause’, ‘people‘, ‘pursue’, ‘treatment’, ‘may’, ‘save’, ‘life’, ‘know’, ‘firsthand’, ‘appropriate’, ‘way’, ‘respond’, ‘tell’, ‘support’, ‘other’, ‘say’, ‘mother’, ‘first’, ‘tell’, ‘hit’, ‘would’, ‘sit’, ‘kitchen’, ‘table’, ‘cry’, ‘uncontrollably’, ‘start’, ‘hrt’, ‘right’, ‘take’, ‘nurse’, ‘year’, ‘tortuous’, ‘severe’, ‘depression‘, ‘ask’, ‘do’, ‘hormone’, ‘panel’, ‘flabbergast’, ‘go’, ‘lowt’, ‘men’, ‘health’, ‘center’, ‘addition’, ‘find‘, ‘severely’, ‘low’, ‘receive’, ‘great’, ‘man’, ‘health’, ‘care’, ‘know’, ‘funny’, ‘deduce’, ‘man’, ‘mid’, ‘life’, ‘crisis’, ‘hormone’, ‘imbalance’, ‘likely’, ‘low’, ‘get’, ‘ball’, ‘bust’, ‘buy’, ‘corvette’, ‘woman’, ‘get’, ‘sympathy’, ‘go’, ‘change’, ‘enough’, ‘question’, ‘come’, ‘first’, ‘opinion’, ‘testosterone’, ‘brain’, ‘get’, ‘testosterone’, ‘shot’, ‘help’, ‘put’, ‘dent’, ‘depression‘, ‘make‘, ‘feel’, ‘well’, ‘still’, ‘leave’, ‘pretty’, ‘severe’, ‘depression‘, ‘admittedly’, ‘hormone’, ‘vitamin’, ‘could’, ‘do’, ‘aggressively’, ‘recently’, ‘do’, ‘put’, ‘brain’, ‘glide’, ‘path’, ‘depression‘, ‘amazing’, ‘think’, ‘fix’, ‘fix’, ‘brain’, ‘still’, ‘aggressively’, ‘pursue’, ‘low’, ‘hear’, ‘cortisol’, ‘kill’, ‘testosterone’]