Plant of the Day – Ghost Pipe

Ghost Pipe or Indian Pipe or Monotropa uniflora – it is a non-photosynthetic flowering plant. It’s neither a fungus nor a typical flower in terms of how it obtains its food, making it a particularly intriguing member of the plant world. It can be found in various parts of North America, from Alaska to California in the west and from Nova Scotia to Florida in the east. It also grows in parts of Asia, such as in the Himalayas and northeastern Siberia. It prefers shaded, moist, mature forests, especially those with abundant leaf litter and decaying organic matter, which supports the mycorrhizal fungi it relies on. The plant often grows under broadleaf trees like oaks and pines, benefiting from the fungi that form mutualistic relationships with these trees.

An individual Indian pipe plant is ephemeral in nature. After it emerges from the ground, it flowers, sets seed, and then decays over a relatively short period, often within a week or two. The perennial part of the plant, its underground structures (like rhizomes), can live for several years. These structures can give rise to above-ground shoots annually for many years, under suitable conditions.

The photo below was taken by me recently this August. This Ghost Pipe was found in the beautiful Meisel Woods Conservation Area, Ontario, Canada.

  • Lacks Chlorophyll: Unlike most plants, Monotropa uniflora lacks chlorophyll, the green pigment that enables photosynthesis. This absence gives it its distinctive pale or translucent appearance.
  • Mycotrophic Nutrition: Since it can’t produce its own food through photosynthesis, Monotropa uniflora obtains its nutrients through a unique relationship with mycorrhizal fungi. These fungi have a mutualistic relationship with certain trees, exchanging nutrients with them. So, indirectly, Monotropa uniflora is obtaining nutrients from the trees via the fungi. In essence, it “eats” by parasitizing the fungi that are associated with tree roots.
  • Fungi Partners: The primary group of fungi that associate with Monotropa uniflora are from the Russulaceae family. This includes fungi that are ectomycorrhizal with trees, forming a sheath around tree roots where nutrient exchange occurs.
  • Reproduction: It’s a flowering plant, so it reproduces via seeds. The flowers are typically pollinated by bees. After pollination, the flower stands erect from its previously nodding position. The seeds of Monotropa uniflora contain very little energy reserve, so they require the presence of the right fungus to germinate. Once a seed comes into contact with a compatible fungus, the fungus will invade the seed and initiate the growth of the plant.
  • Indirect Relationship with Trees: The ectomycorrhizal fungi have a symbiotic relationship with trees. In this mutualistic relationship:
    – Trees provide the fungi with sugars produced through photosynthesis.
    – The fungi provide trees with essential minerals and water from the soil.
    – Monotropa uniflora exploits this system by drawing nutrients from the fungi, which indirectly means it’s obtaining resources from the trees, although it doesn’t directly parasitize the trees themselves.
  • Dependence on the Fungi-Tree Relationship: It’s essential to understand that without the fungi-tree relationship, Monotropa uniflora couldn’t survive. The plant is entirely dependent on the organic compounds produced by photosynthetic plants (trees) and transferred through the mycorrhizal fungi.

In the relationship between Monotropa uniflora the fungi it associates with, it might seem that the fungi get the short end of the stick since the Indian pipe is effectively parasitizing the fungus. However, when looking at this association in the broader context of the forest ecosystem, some potential indirect benefits or considerations arise:

Seed Germination and Fungal Propagation: The initial interaction between the fungi and the Indian pipe seeds can promote fungal colonization. The process of seed germination and growth might offer conditions conducive for the fungal hyphae to spread and grow.

Promotion of Fungal Diversity: Interactions with plants like Monotropa uniflora might play a role in promoting fungal diversity in forest ecosystems. A diverse mycorrhizal network can enhance soil health and overall forest resilience.

Fructose Malabsorption – Applying the Luhn algorithm for text summarization

The Luhn algorithm is a text summarization technique that uses statistical properties of the text to identify and extract the most important sentences from a document. The algorithm was developed by H.P. Luhn in the 1950s, and is still widely used in various forms today.

The Luhn algorithm works by first analyzing the frequency of each word in the document, and then assigning a score to each sentence based on the frequency of the words it contains. Sentences that contain words that are more frequent in the document as a whole are considered to be more important, and are assigned higher scores. The algorithm then selects the top-scoring sentences and concatenates them together to form the summary. The length of the summary is usually determined in advance by the user, and the algorithm selects the most important sentences that fit within that length limit.

It works by identifying the most salient or important sentences in a document based on the frequency of important words and their distribution within each sentence. First, the algorithm removes stopwords, which are common words such as “the”, “and”, and “a” that do not carry much meaning. Additionally, one could apply stemming, which reduces words to their base or root form. For example, “likes” and “liked” are reduced to “like”. Then, the algorithm looks for important words in each sentence. These are typically nouns, verbs, and adjectives that carry the most meaning. The specific method for identifying important words may vary depending on the implementation of the algorithm, but in general, they are selected based on their frequency and relevance to the topic of the text.

The algorithm counts the number of important words in each sentence and divides it by the span, or the distance between the first and last occurrence of an important word. This gives a measure of how densely the important words are distributed within the sentence. Finally, the algorithm ranks the sentences based on their scores, with the highest scoring sentences considered the most important and selected for the summary.

Here are the step-by-step instructions for the Luhn algorithm:

  1. Preprocess the text: Remove any stop words, punctuation, and other non-textual elements from the document, and convert all the remaining words to lowercase.
  2. Calculate the word frequency: Count the number of occurrences of each word in the document, and store this information in a frequency table.
  3. For each sentence, calculate the score by:
    a. Identifying the significant words (excluding stop words) that occur in the sentence.
    b. Ordering the significant words by their position in the sentence.
    c. Determining the distance between adjacent significant words (the “span”).
    d. Calculating a score for the sentence as the sum of the square of the number of significant words divided by the span for each adjacent pair of significant words.
  4. Select the top-scoring sentences: Sort the sentences in the document by their score, and select the top-scoring sentences up to a maximum length L. The length L is typically chosen by the user in advance, and represents the maximum number of words or sentences that the summary can contain.
  5. Generate the summary: Concatenate the selected sentences together to form the summary.

Below I summarize the topic of fructose malabsorption by generating a summary using the Luhn algorithm. To create the summary, I selected several articles from sources like Wikipedia and PubMed. The important words were selected based on their total frequency in all of the text. I chose the top 25 words to focus on, and then used the algorithm to identify the most important sentences based on the frequency and distribution of these words. The summary was generated using the top 15 sentences.

Symptoms and signs of Fructose malabsorption may cause gastrointestinal symptoms such as abdominal pain, bloating, flatulence or diarrhea. Although often assumed to be an acceptable alternative to wheat, spelt flour is not suitable for people with fructose malabsorption, just as it is not appropriate for those with wheat allergies or celiac disease. However, fructose malabsorbers do not need to avoid gluten, as those with celiac disease must. Many fructose malabsorbers can eat breads made from rye and corn flour. This can cause some surprises and pitfalls for fructose malabsorbers. Foods (such as bread) marked “gluten-free” are usually suitable for fructose malabsorbers, though they need to be careful of gluten-free foods that contain dried fruit or high fructose corn syrup or fructose itself in sugar form. Food-labeling Producers of processed food in most or all countries, including the US, are not currently required by law to mark foods containing “fructose in excess of glucose”.

Stone fruit: apricot, nectarine, peach, plum (caution – these fruits contain sorbitol);Berry fruit: blackberry, boysenberry, cranberry, raspberry, strawberry, loganberry; Citrus fruit: kumquat, grapefruit, lemon, lime, mandarin, orange, tangelo; Other fruits: ripe banana, jackfruit, passion fruit, pineapple, rhubarb, tamarillo.

The fructose and glucose contents of foods listed on the Australian food standards would appear to indicate that most of the listed foods have higher fructose levels.

Glucose enhances absorption of fructose, so fructose from foods with fructose-to-glucose ratio <1, like white potatoes, are readily absorbed, whereas foods with fructose-to-glucose ratio >1, like apples and pears, are often problematic regardless of the total amount of fructose in the food.

Foods with high fructose-to-glucose ratio.

Dietary supplements of xylose isomerase may improve some symptoms of fructose malabsorption, although there is currently only a single scientific study available.

This can lead to issues such as bloating, heartburn and constipation.


It is called a hydrogen breath test and is the method currently used for a clinical diagnosis.

People with fructose malabsorption absorb less than 25 g per sitting.

Fructose malabsorption should be considered in patients with symptoms of depression and disturbances of tryptophan metabolism. This prospective cross-sectional study aimed to investigate the potential association between primary-adult lactose malabsorption, fructose malabsorption, tryptophan (TRP) metabolism and the presence of depressive signs.

High intestinal fructose concentration seems to interfere with L-tryptophan metabolism, and it may reduce availability of tryptophan for the biosynthesis of serotonin (5-hydroxytryptamine).

After obtaining the top sentences above, I used them as input into ChatGPT to generate another summary. The summary ended up with some mistakes, such as “People with this condition should avoid foods like spelt flour that contain fructose.“, as spelt does not contain fructose, it contains fructans. But it was still a fun exercise.

Fructose malabsorption can cause gastrointestinal symptoms such as abdominal pain, bloating, flatulence, and diarrhea. People with this condition should avoid foods like spelt flour that contain fructose. However, they do not need to avoid gluten, and many can eat breads made from rye and corn flour. Foods marked “gluten-free” are generally suitable for fructose malabsorbers, but they need to be careful of gluten-free foods that contain dried fruit or high fructose corn syrup. Certain fruits such as apricots, nectarines, peaches, and plums can cause issues due to their sorbitol content. The fructose-to-glucose ratio of foods can also play a role in absorption, with foods with a higher ratio often being problematic. A hydrogen breath test is used for clinical diagnosis, and dietary supplements of xylose isomerase may help with symptoms. Fructose malabsorption may also be associated with depressive signs due to the interference with L-tryptophan metabolism.

Filling gluten-free breakfast muffins

Kitchen utensils

I have tried out various kitchen utensils, including muffin pans, loaf pans, a mixer, and a blender, among others. While these items can be useful for baking and food preparation, I found that many of them took too long to clean, making them impractical for everyday use. As a result, I ended up donating all of the utensils that I wasn’t using. The only items that I found to be useful and easy to clean are silicone baking sheets and silicone muffin cups. These utensils are made from a non-stick material that eliminates the need for oil or butter and silicone material also distributes heat evenly.

Silicone baking trays/sheets are primarily made of silicone, which is a synthetic polymer material composed of silicon, oxygen, carbon, and hydrogen. However, some silicone baking trays/sheets may contain additional materials, such as fiberglass or nylon, to provide additional strength and durability.

Fiberglass is sometimes added to the silicone material to reinforce the tray/sheet and prevent it from bending or warping during use. This is especially important for larger or thicker trays/sheets, which may be more prone to deformation under high temperatures or heavy use. Nylon is another material that may be added to the silicone to provide additional durability and heat resistance. Nylon-reinforced silicone trays/sheets are often used in commercial kitchens, as they are more resistant to wear and tear and can withstand high-volume use.

When exposed to heat, silicone responds by retaining its shape and structure without melting or warping. This means that it can be used in high-temperature environments without the risk of deformation or damage. Silicone is also an excellent insulator, which means that it helps to distribute heat evenly across the surface of the tray/sheet.

The other useful utensils that I regulary use are metal bowls for mixing, a metal whisk, and silicone spatulas.

Breakfast muffins


The muffins that I came up with require two bowls for mixing, a metal whisk, a spoon, and silicone muffin cups. These muffins are gluten-free. They are made from gluten-free oats, oat flour and or sorghum flour. I also add flax seeds and chia seeds for fiber and nutrition. Sweetness comes from ripe bananas, apple sauce, and honey. I find these muffins very filling, sometimes my breakfast consists of just several muffins and a coffee with oat milk. The recipe also includes two eggs. Eggs, flax seeds, and chia seeds, do contain protein. The flour contains carbs. The eggs and butter add fat. So I would say this is a balanced breakfast. The recipe is below:

Bowl 1 – dry ingredients:
1 1/2 cups gluten-free oats
1 1/2 cups oat/sorghum flour
1/4 cup ground flax seeds
1 teaspoon chia seeds
1/4 teaspoon salt
1 teaspoon xanthan gum
3/4 teaspoon baking soda

Mix together all of the dry ingredients in bowl 1

Bowl 2 – wet ingredients:
2 mashed ripe bananas (start with this step first – mash the bananas in bowl 2)
1/2 cup apple sauce
5 tablespoons of melted butter/vegan butter
3 tablespoons peanut butter
3 tablespooons honey
1 cup blueberries

Mix together all of the wet ingredients with a metal whisk, starting with mashing the ripe bananas first. Make sure that everything that you mix is at room temperature. Add blueberries last.

Pour the mixture from bowl 2 into bowl 1, again, mix everything together. Let the final mixture stand for 15 minutes at room temperature. While the mixture is standing, you can turn on the oven to 350 F, so that it starts preheating.

Place silicone muffins cups on a tray. After 15 minutes pass, the dough is ready, use a spoon to pour the mixture into the silicone muffin cups. Place the tray with the muffin cups into the oven. Bake at 350 F for 45 minutes.

Sequence-to-Sequence and Attention

What are Sequence-to-Sequence models?

Sequence-to-Sequence (Seq2Seq) models are a type of neural network architecture used for natural language processing tasks, such as machine translation, text summarization, and conversational modeling. The basic idea behind Seq2Seq models is to map a variable-length input sequence to a variable-length output sequence.

Seq2Seq models consist of two parts: an encoder and a decoder. The encoder takes an input sequence, such as a sentence, and generates a fixed-length representation of it, called the context vector. The decoder then takes the context vector as input and generates the output sequence, such as a translation of the input sentence into another language. Both encoder and decoder contain multiple recurrent units that take one element as input. The encoder processes the input sequence one word at a time and generates a hidden state h_i for each timestep i. Finally, it passes the last hidden state h_n to the decoder, which uses it as the initial state to generate the output sequence.

In a Seq2Seq model, the hidden state refers to the internal representation of the input sequence that is generated by the recurrent units in the encoder or decoder. The hidden state is a vector of numbers that represents the “memory” of the recurrent unit at each timestep.

Let’s consider a simple recurrent unit, such as the Long Short-Term Memory (LSTM) cell. An LSTM cell takes as input the current input vector x_t and the previous hidden state h_{t-1}, and produces the current hidden state h_t as output. The LSTM cell can be represented mathematically as follows:


Here, W_{ix}, W_{ih}, W_{fx}, W_{fh}, W_{ox}, W_{oh}, W_{cx}, and W_{ch} are weight matrices, b_i, b_f, b_o, and b_c are bias vectors, sigmoid is the sigmoid activation function, and tanh is the hyperbolic tangent activation function.

At each timestep t, the LSTM cell computes the input gate i_t, forget gate f_t, output gate o_t, and cell state c_t based on the current input x_t and the previous hidden state h_{t-1}. The current hidden state h_t is then computed based on the current cell state c_t and the output gate o_t. In this way, the hidden state h_t represents the internal memory of the LSTM cell at each timestep t. It contains information about the current input x_t as well as the previous inputs and hidden states, which allows the LSTM cell to maintain a “memory” of the input sequence as it is processed by the encoder or decoder.

Encoder and decoder

The Seq2Seq model consists of two parts: an encoder and a decoder. Both of these parts contain multiple recurrent units that take one element as input. The encoder processes the input sequence one word at a time and generates a hidden state h_i for each timestep i. Finally, it passes the last hidden state h_n to the decoder, which uses it as the initial state to generate the output sequence.

The final hidden state of the encoder represents the entire input sequence as a fixed-length vector. This fixed-length vector serves as a summary of the input sequence and is passed on to the decoder to generate the output sequence. The purpose of this fixed-length vector is to capture all the relevant information about the input sequence in a condensed form that can be easily used by the decoder. By encoding the input sequence into a fixed-length vector, the Seq2Seq model can handle input sequences of variable length and generate output sequences of variable length.

The decoder takes the fixed-length vector representation of the input sequence, called the context vector, and uses it as the initial hidden state s_0 to generate the output sequence. At each timestep t, the decoder produces an output y_t and an updated hidden state s_t based on the previous output and hidden state. This can be represented mathematically using linear algebra as follows:


Here, W_s, U_s, and V_s are weight matrices, b_s is a bias vector, c is the context vector (from the encoder), and f and g are activation functions. The decoder uses the previous output y_{t-1} and hidden state s_{t-1} as input to compute the updated hidden state s_t, which depends on the current input and the context vector. The updated hidden state s_t is then used to compute the current output y_t, which depends on the updated hidden state s_t. By iteratively updating the hidden state and producing outputs at each timestep, the decoder can generate a sequence of outputs that is conditioned on the input sequence and the context vector.

What is the context vector, where does it come from?

In a Seq2Seq model, the context vector is a fixed-length vector representation of the input sequence that is used by the decoder to generate the output sequence. The context vector is computed by the encoder and is passed on to the decoder as the final hidden state of the encoder.

What is a transformer? How is are decoders encoders used in transformers?

The Transformer architecture consists of an encoder and a decoder, similar to the Seq2Seq model. However, unlike the Seq2Seq model, the Transformer does not use recurrent neural networks (RNNs) to process the input sequence. Instead, it uses a self-attention mechanism that allows the model to attend to different parts of the input sequence at each layer.

In the Transformer architecture, both the encoder and the decoder are composed of multiple layers of self-attention and feedforward neural networks. The encoder takes the input sequence as input and generates a sequence of hidden representations, while the decoder takes the output sequence as input and generates a sequence of hidden representations that are conditioned on the input sequence and previous outputs.

Traditional Seq2Seq vs. attention-based models

In traditional Seq2Seq models, the encoder compresses the input sequence into a single fixed-length vector, which is then used as the initial hidden state of the decoder. However, in some more recent Seq2Seq models, such as the attention-based models, the encoder computes a context vector c_i for each output timestep i, which summarizes the relevant information from the input sequence that is needed for generating the output at that timestep.

The decoder then uses the context vector c_i along with the previous hidden state s_i-1 to generate the output for the current timestep i. This allows the decoder to focus on different parts of the input sequence at different timesteps and generate more accurate and informative outputs.

The context vector c_i is computed by taking a weighted sum of the encoder’s hidden states, where the weights are learned during training based on the decoder’s current state and the input sequence. This means that the context vector c_i is different for each output timestep i, allowing the decoder to attend to different parts of the input sequence as needed. The context vector c_i can be expressed mathematically as:


where i is the current timestep of the decoder and j indexes the hidden states of the encoder. The attention weights α_ij are calculated using an alignment model, which is typically a feedforward neural network (FFNN) parametrized by learnable weights. The alignment model takes as input the previous hidden state s_i-1 of the decoder and the current hidden state h_j of the encoder, and produces a scalar score e_ij:

where a is the alignment model. The scores are then normalized using the softmax function to obtain the attention weights α_ij:

where k indexes the hidden states of the encoder.

The attention weights α_ij reflect the importance of each hidden state h_i with respect to the previous hidden state s_i-1 in generating the output y_i. The higher the attention weight α_ij, the more important the corresponding hidden state h_i is for generating the output at the current timestep i. By computing a context vector c_i as a weighted sum of the encoder’s hidden states, the decoder is able to attend to different parts of the input sequence at different timesteps and generate more accurate and informative outputs.

The difference between context vector in Seq2Seq and context vector in attention

In a traditional Seq2Seq model, the encoder compresses the input sequence into a fixed-length vector, which is then used as the initial hidden state of the decoder. The decoder then generates the output sequence word by word, conditioned on the input and the previous output words. The fixed-length vector essentially contains all the information of the input sequence, and the decoder needs to rely solely on it to generate the output sequence. This can be expressed mathematically as:

c = h_n

where c is the fixed-length vector representing the input sequence, and h_n is the final hidden state of the encoder.

In an attention-based Seq2Seq model, the encoder computes a context vector c for each output timestep i, which summarizes the relevant information from the input sequence that is needed for generating the output at that timestep. The context vector is a weighted sum of the encoder’s hidden states, where the weights are learned during training based on the decoder’s current state and the input sequence.

The attention mechanism allows the decoder to choose which aspects of the input sequence to give attention to, rather than requiring the encoder to compress all the information into a single vector and transferring it to the decoder.

Summarizing articles on PMDD treatments using TextRank

In this blog post, I want to share with you what I learned about treating PMDD using articles summarization through TextRank. TextRank is not really a summarization algorithm, it is used for extracting top sentences, but I decided to use it anyways and see the results. I started by using the googlesearch library in python to search for “PMDD treatments – calcium, hormones, SSRIs, scientific evidence”. The search resulted in a list of URLs to various articles on PMDD treatments. However, not all of them were useful for my purposes, as some were blocked due to access restrictions. I used BeautifulSoup to extract the text from the remaining articles.

In order to exclude irrelevant paragraphs, I used the library called Justext. This library is designed for removing boilerplate content and other non-relevant text from HTML pages. Justext uses a heuristics to determine which parts of the page are boilerplate and which are not, and then filters out the former. Justext tries to identify these sections by analyzing the length of the text, the density of links, and the presence of certain HTML tags.

Some examples of the kinds of content that Justext can remove include navigation menus, copyright statements, disclaimers, and other non-content-related text. It does not work perfectly, as I still ended up with sentences such as the following in the resulting articles: “This content is owned by the AAFP. A person viewing it online may make one printout of the material and may use that printout only for his or her personal, non-commercial reference.”

Next, I used existing code that implements the TextRank algorithm that I found online. I slightly improved it so that instead of bag of words method the algorithm would use sentence embeddings. Let’s go step by step through the algorithm. I defined a class called TextRank4Sentences. Here is a description of each line in the __init__ method of this class:

self.damping = 0.85: This sets the damping coefficient used in the TextRank algorithm to 0.85. In this case, it determines the probability of the algorithm to transition from one sentence to another.

self.min_diff = 1e-5: This sets the convergence threshold. The algorithm will stop iterating when the difference between the PageRank scores of two consecutive iterations is less than this value.

self.steps = 100: This sets the number of iterations to run the algorithm before stopping.

self.text_str = None: This initializes a variable to store the input text.

self.sentences = None: This initializes a variable to store the individual sentences of the input text.

self.pr_vector = None: This initializes a variable to store the TextRank scores for each sentence in the input text.

from nltk import sent_tokenize, word_tokenize
from nltk.cluster.util import cosine_distance
from sklearn.metrics.pairwise import cosine_similarity

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('distilbert-base-nli-stsb-mean-tokens')

MULTIPLE_WHITESPACE_PATTERN = re.compile(r"\s+", re.UNICODE)

class TextRank4Sentences():
    def __init__(self):
        self.damping = 0.85  # damping coefficient, usually is .85
        self.min_diff = 1e-5  # convergence threshold
        self.steps = 100  # iteration steps
        self.text_str = None
        self.sentences = None
        self.pr_vector = None

The next step is defining a private method _sentence_similarity() which takes in two sentences and returns their cosine similarity using a pre-trained model. The method encodes each sentence into a vector using the pre-trained model and then calculates the cosine similarity between the two vectors using another function core_cosine_similarity().

core_cosine_similarity() is a separate function that measures the cosine similarity between two vectors. It takes in two vectors as inputs and returns a similarity score between 0 and 1. The function uses the cosine_similarity() function from the sklearn library to calculate the similarity score. The cosine similarity is a measure of the similarity between two non-zero vectors of an inner product space. It is calculated as the cosine of the angle between the two vectors.

Mathematically, given two vectors u and v, the cosine similarity is defined as:

cosine_similarity(u, v) = (u . v) / (||u|| ||v||)

where u . v is the dot product of u and v, and ||u|| and ||v|| are the magnitudes of u and v respectively.

def core_cosine_similarity(vector1, vector2):
    """
    measure cosine similarity between two vectors
    :param vector1:
    :param vector2:
    :return: 0 < cosine similarity value < 1
    """
    sim_score = cosine_similarity(vector1, vector2)
    return sim_score

class TextRank4Sentences():
    def __init__(self):
        ...

    def _sentence_similarity(self, sent1, sent2):
        first_sent_embedding = model.encode([sent1])
        second_sent_embedding = model.encode([sent2])
        
        return core_cosine_similarity(first_sent_embedding, second_sent_embedding)

In the next function, the similarity matrix is built for the given sentences. The function _build_similarity_matrix takes a list of sentences as input and creates an empty similarity matrix sm with dimensions len(sentences) x len(sentences). Then, for each sentence in the list, the function computes its similarity with all other sentences in the list using the _sentence_similarity function. After calculating the similarity scores for all sentence pairs, the function get_symmetric_matrix is used to make the similarity matrix symmetric.

The function get_symmetric_matrix adds the transpose of the matrix to itself, and then subtracts the diagonal elements of the original matrix. In other words, for each element (i, j) of the input matrix, the corresponding element (j, i) is added to it to make it symmetric. However, the diagonal elements (i, i) of the original matrix are not added twice, so they need to be subtracted once from the sum of the corresponding elements in the upper and lower triangles. The resulting matrix has the same values in the upper and lower triangles, and is symmetric along its main diagonal. The similarity matrix is made symmetric in order to ensure that the similarity score between two sentences in the matrix is the same regardless of their order, and it also simplifies the computation.

def get_symmetric_matrix(matrix):
    """
    Get Symmetric matrix
    :param matrix:
    :return: matrix
    """
    return matrix + matrix.T - np.diag(matrix.diagonal())

class TextRank4Sentences():
    def __init__(self):
        ...

    def _sentence_similarity(self, sent1, sent2):
        ...
    
    def _build_similarity_matrix(self, sentences, stopwords=None):
        # create an empty similarity matrix
        sm = np.zeros([len(sentences), len(sentences)])
    
        for idx, sentence in enumerate(sentences):
            print("Current location: %d" % idx)
            sm[idx] = self._sentence_similarity(sentence, sentences)
    
        # Get Symmeric matrix
        sm = get_symmetric_matrix(sm)
    
        # Normalize matrix by column
        norm = np.sum(sm, axis=0)
        sm_norm = np.divide(sm, norm, where=norm != 0)  # this is ignore the 0 element in norm
    
        return sm_norm

In the next function, the ranking algorithm PageRank is implemented to calculate the importance of each sentence in the document. The similarity matrix created in the previous step is used as the basis for the PageRank algorithm. The function takes the similarity matrix as input and initializes the pagerank vector with a value of 1 for each sentence.

In each iteration, the pagerank vector is updated based on the similarity matrix and damping coefficient. The damping coefficient represents the probability of continuing to another sentence at random, rather than following a link from the current sentence. The algorithm continues to iterate until either the maximum number of steps is reached or the difference between the current and previous pagerank vector is less than a threshold value. Finally, the function returns the pagerank vector, which represents the importance score for each sentence.

class TextRank4Sentences():
    def __init__(self):
        ...

    def _sentence_similarity(self, sent1, sent2):
        ...
    
    def _build_similarity_matrix(self, sentences, stopwords=None):
        ...

    def _run_page_rank(self, similarity_matrix):

        pr_vector = np.array([1] * len(similarity_matrix))

        # Iteration
        previous_pr = 0
        for epoch in range(self.steps):
            pr_vector = (1 - self.damping) + self.damping * np.matmul(similarity_matrix, pr_vector)
            if abs(previous_pr - sum(pr_vector)) < self.min_diff:
                break
            else:
                previous_pr = sum(pr_vector)

        return pr_vector

The _get_sentence function takes an index as input and returns the corresponding sentence from the list of sentences. If the index is out of range, it returns an empty string. This function is used later in the class to get the highest ranked sentences.

class TextRank4Sentences():
    def __init__(self):
        ...

    def _sentence_similarity(self, sent1, sent2):
        ...
    
    def _build_similarity_matrix(self, sentences, stopwords=None):
        ...

    def _run_page_rank(self, similarity_matrix):
        ...

    def _get_sentence(self, index):

        try:
            return self.sentences[index]
        except IndexError:
            return ""

The code then defines a method called get_top_sentences which returns a summary of the most important sentences in a document. The method takes two optional arguments: number (default=5) specifies the maximum number of sentences to include in the summary, and similarity_threshold (default=0.5) specifies the minimum similarity score between two sentences that should be considered “too similar” to include in the summary.

The method first initializes an empty list called top_sentences to hold the selected sentences. It then checks if a pr_vector attribute has been computed for the document. If the pr_vector exists, it sorts the indices of the sentences in descending order based on their PageRank scores and saves them in the sorted_pr variable.

It then iterates through the sentences in sorted_pr, starting from the one with the highest PageRank score. For each sentence, it removes any extra whitespace, replaces newlines with spaces, and checks if it is too similar to any of the sentences already selected for the summary. If it is not too similar, it adds the sentence to top_sentences. Once the selected sentences are finalized, the method concatenates them into a single string separated by spaces, and returns the summary.

class TextRank4Sentences():
    def __init__(self):
        ...

    def _sentence_similarity(self, sent1, sent2):
        ...
    
    def _build_similarity_matrix(self, sentences, stopwords=None):
        ...

    def _run_page_rank(self, similarity_matrix):
        ...

    def _get_sentence(self, index):
        ...
   
    def get_top_sentences(self, number=5, similarity_threshold=0.5):
        top_sentences = []
    
        if self.pr_vector is not None:
            sorted_pr = np.argsort(self.pr_vector)
            sorted_pr = list(sorted_pr)
            sorted_pr.reverse()
    
            index = 0
            while len(top_sentences) < number and index < len(sorted_pr):
                sent = self.sentences[sorted_pr[index]]
                sent = normalize_whitespace(sent)
                sent = sent.replace('\n', ' ')
    
                # Check if the sentence is too similar to any of the sentences already in top_sentences
                is_similar = False
                for s in top_sentences:
                    sim = self._sentence_similarity(sent, s)
                    if sim > similarity_threshold:
                        is_similar = True
                        break
    
                if not is_similar:
                    top_sentences.append(sent)
    
                index += 1
        
        summary = ' '.join(top_sentences)
        return summary

The _remove_duplicates method takes a list of sentences as input and returns a list of unique sentences, by removing any duplicates in the input list.

class TextRank4Sentences():
    def __init__(self):
        ...

    def _sentence_similarity(self, sent1, sent2):
        ...
    
    def _build_similarity_matrix(self, sentences, stopwords=None):
        ...

    def _run_page_rank(self, similarity_matrix):
        ...

    def _get_sentence(self, index):
        ...
   
    def get_top_sentences(self, number=5, similarity_threshold=0.5):
        ...
    
    def _remove_duplicates(self, sentences):
        seen = set()
        unique_sentences = []
        for sentence in sentences:
            if sentence not in seen:
                seen.add(sentence)
                unique_sentences.append(sentence)
        return unique_sentences

The analyze method takes a string text and a list of stop words stop_words as input. It first creates a unique list of words from the input text by using the set() method and then joins these words into a single string self.full_text.

It then uses the sent_tokenize() method from the nltk library to tokenize the text into sentences and removes duplicate sentences using the _remove_duplicates() method. It also removes sentences that have a word count less than or equal to the fifth percentile of all sentence lengths.

After that, the method calculates a similarity matrix using the _build_similarity_matrix() method, passing in the preprocessed list of sentences and the stop_words list.

Finally, it runs the PageRank algorithm on the similarity matrix using the _run_page_rank() method to obtain a ranking of the sentences based on their importance in the text. This ranking is stored in self.pr_vector.

class TextRank4Sentences():
    ...

    def analyze(self, text, stop_words=None):
        self.text_unique = list(set(text))
        self.full_text = ' '.join(self.text_unique)
        #self.full_text = self.full_text.replace('\n', ' ')
        
        self.sentences = sent_tokenize(self.full_text)
        
        # for i in range(len(self.sentences)):
        #     self.sentences[i] = re.sub(r'[^\w\s$]', '', self.sentences[i])
    
        self.sentences = self._remove_duplicates(self.sentences)
        
        sent_lengths = [len(sent.split()) for sent in self.sentences]
        fifth_percentile = np.percentile(sent_lengths, 10)
        self.sentences = [sentence for sentence in self.sentences if len(sentence.split()) > fifth_percentile]

        print("Min length: %d, Total number of sentences: %d" % (fifth_percentile, len(self.sentences)) )

        similarity_matrix = self._build_similarity_matrix(self.sentences, stop_words)

        self.pr_vector = self._run_page_rank(similarity_matrix)

In order to find articles, I used the googlesearch library. The code below performs a Google search using the Google Search API provided by the library. It searches for the query “PMDD treatments – calcium, hormones, SSRIs, scientific evidence” and retrieves the top 7 search results.

# summarize articles
import requests
from bs4 import BeautifulSoup
from googlesearch import search
import justext
query = "PMDD treatments - calcium, hormones, SSRIs, scientific evidence"

# perform the google search and retrieve the top 5 search results
top_results = []
for url in search(query, num_results=7):
    top_results.append(url)

In the next part, the code extracts the article text for each of the top search results collected in the previous step. For each URL in the top_results list, the code sends an HTTP GET request to the URL using the requests library. It then uses the justext library to extract the main content of the webpage by removing any boilerplate text (i.e., non-content text).

article_texts = []

# extract the article text for each of the top search results
for url in top_results:
    response = requests.get(url)
    paragraphs = justext.justext(response.content, justext.get_stoplist("English"))
    text = ''
    for paragraph in paragraphs:
        if not paragraph.is_boilerplate:
            text += paragraph.text + '\n'

    if "Your access to PubMed Central has been blocked" not in text:
        article_texts.append(text.strip())
        print(text)
    print('-' * 50)
    
print("Total articles collected: %d" % len(article_texts))

In the final step, the extracted article texts are passed to an instance of the TextRank4Sentences class, which is used to perform text summarization. The output of get_top_sentences() is a list of the top-ranked sentences in the input text, which are considered to be the most important and representative sentences for summarizing the content of the text. This list is stored in the variable summary_text.

# summarize
tr4sh = TextRank4Sentences()
tr4sh.analyze(article_texts)
summary_text = tr4sh.get_top_sentences(15)

Results:
(I did not list irrelevant sentences that appeared in the final results, such as “You will then receive an email that contains a secure link for resetting your password…“)

Total articles collected: 6

There have been at least 15 randomized controlled trials of the use of selective serotonin-reuptake inhibitors (SSRIs) for the treatment of severe premenstrual syndrome (PMS), also called premenstrual dysphoric disorder (PMDD).

It is possible that the irritability/anger/mood swings subtype of PMDD is differentially responsive to treatments that lead to a quick change in ALLO availability or function, for example, symptom-onset SSRI or dutasteride.
* My note: ALLO is allopregnanolone
* My note: Dutasteride is a synthetic 4-azasteroid compound that is a selective inhibitor of both the type 1 and type 2 isoforms of steroid 5 alpha-reductase

From 2 to 10 percent of women of reproductive age have severe distress and dysfunction caused by premenstrual dysphoric disorder, a severe form of premenstrual syndrome.

The rapid efficacy of selective serotonin reuptake inhibitors (SSRIs) in PMDD may be due in part to their ability to increase ALLO levels in the brain and enhance GABAA receptor function with a resulting decrease in anxiety.

Clomipramine, a serotoninergic tricyclic antidepressant that affects the noradrenergic system, in a dosage of 25 to 75 mg per day used during the full cycle or intermittently during the luteal phase, significantly reduced the total symptom complex of PMDD.

Relapse was more likely if a woman stopped sertraline after only 4 months versus 1 year, if she had more severe symptoms prior to treatment and if she had not achieved full symptom remission with sertraline prior to discontinuation.

Women with negative views of themselves and the future caused or exacerbated by PMDD may benefit from cognitive-behavioral therapy. This kind of therapy can enhance self-esteem and interpersonal effectiveness, as well as reduce other symptoms.

Educating patients and their families about the disorder can promote understanding of it and reduce conflict, stress, and symptoms.

Anovulation can also be achieved with the administration of estrogen (transdermal patch, gel, or implant).

In a recent meta-analysis of 15 randomized, placebo-controlled studies of the efficacy of SSRIs in PMDD, it was concluded that SSRIs are an effective and safe first-line therapy and that there is no significant difference in symptom reduction between continuous and intermittent dosing.

Preliminary confirmation of alleviation of PMDD with suppression of ovulation with a GnRH agonist should be obtained prior to hysterectomy.

Sexual side effects, such as reduced libido and inability to reach orgasm, can be troubling and persistent, however, even when dosing is intermittent. * My note: I think this sentence refers to the side-effects of SSRIs


Calculating Confidence Interval for a Percentile

Calculating the confidence interval for a percentile is a crucial step in understanding the variability and the uncertainty around the estimated value. In many real-world applications, the distribution of the data is unknown and this makes it difficult to determine the confidence intervals. In such scenarios, using a binomial distribution can be a viable alternative to estimate the confidence intervals for a percentile.

For instance, let’s consider a variable with 300 data points and we want to calculate the 70th and 90th percentiles and the corresponding confidence intervals for the variable. To do this, we can use a binomial distribution approach.

First, we need to choose an alpha level, which is a probability that determines the size of the confidence interval. A common choice for alpha is 0.05, which corresponds to a 95% confidence interval.

Next, we use the cumulative distribution function (CDF) of the binomial distribution to estimate the lower and upper bounds of the confidence interval. The CDF of the binomial distribution gives the probability of getting k or fewer successes in n independent Bernoulli trials, where the probability of success in each trial is p.

To calculate the 70th percentile and its confidence interval, we use the following steps:

  1. Set n = 300, which is the number of data points.
  2. Set p = 0.7, which corresponds to the 70th percentile.
  3. Calculate the binomial quantile using the CDF, which is the smallest k such that P(X <= k) >= p, where X is a binomial random variable with parameters n and p.
  4. Use the CDF to determine the lower and upper bounds of the confidence interval.

Below is the python code for calculating the confidence interval for the 70th percentile.

alpha – alpha is a parameter representing the significance level or confidence level for the calculation of the confidence interval. It is the probability that the confidence interval contains the true value of the parameter being estimated. The value of alpha is typically set to 0.05 or 0.01, meaning that there is a 95% or 99% chance, respectively, that the confidence interval contains the true value. In the code, alpha=0.05 is the default value for alpha, but it can be changed to a different value if desired.

n – number of observations

q – percentile value

from scipy.stats import binom
import numpy as np

alpha = 0.05
n = 300
q = 0.7

Below is the code for calculating the upper and lower bounds for the confidence interval. The u value is calculated as the ceiling of the binomial distribution’s quantile function (ppf) evaluated at 1 – alpha / 2 (1 – 0.05 / 2 = 0.975), and the value is shifted by adding an array of numbers from -2 to 2. Any values of u that are greater than n are set to infinity.

u = np.ceil(binom.ppf(1 - alpha / 2, n, q)) + np.arange(-2, 3)
u[u > n] = np.inf

l = np.ceil(binom.ppf(alpha / 2, n, q)) + np.arange(-2, 3)
l[l < 0] = -np.inf

# From the calculation of bounds, np.ceil(binom.ppf(1 - alpha / 2, n, q)) and np.ceil(binom.ppf(alpha / 2, n, q)), we obtain that
# the upper bound value is 225 and the lower bound value is 194. This means that given a sample of size 300, a binomial distribution, and # probability of success p=0.7, we are 95% certain that the number of successes will be between 194 and 225.

Next we calculate coverage of the percentiles that the bounds cover. The coverage represents a matrix of values that correspond to the probability of coverage of the confidence interval for each combination of lower and upper bounds of the interval.

The coverage calculation uses the binom.cdf function to calculate the cumulative distribution function (CDF) for the binomial distribution, which is then used to determine the coverage probability of each combination of u and l. Once the coverage matrix is calculated, the code finds the index i corresponding to the combination of u and l that gives the closest coverage probability to 1-alpha.

coverage = np.zeros((len(l), len(u)))

for i, a in enumerate(l):
    for j, b in enumerate(u):
        coverage[i, j] = binom.cdf(b - 1, n, q) - binom.cdf(a - 1, n, q)

Next we select the upper and lower bounds of the confidence interval based on the coverage of the interval. The code first checks if the maximum coverage is less than 1 minus the significance level alpha. If it is, the code selects the pair of bounds with the maximum coverage probability. Otherwise, the code selects the pair of bounds with the smallest coverage probability that is still greater than or equal to 1 minus alpha.

if np.max(coverage) < 1 - alpha:
    i = np.where(coverage == np.max(coverage))
else:
    i = np.where(coverage == np.min(coverage[coverage >= 1 - alpha]))

i_u = i[0][0]
i_l = i[1][0]

u_final = min(n, u[i_u])
u_final = max(0, int(u_final)-1)
        
l_final = min(n, l[i_l])
l_final = max(0, int(l_final)-1)

The resulting l and u are 192 and 223, respectively. Therefore if you have a sample of 300 and you want to calculate the confidence interval for a variable X, you would sort the values in ascending order, and then you would take the values of X that correspond to the 192nd and 223rd observations.

NLP – Word Embeddings – BERT

BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained transformer-based neural network architecture for natural language processing tasks such as text classification, question answering, and language inference. One important feature of BERT is its use of word embeddings, which are mathematical representations of words in a continuous vector space.

In BERT, word embeddings are learned during the pre-training phase and are fine-tuned during the task-specific fine-tuning phase. These embeddings are learned by training the model on a large corpus of text, and they are able to capture semantic and syntactic properties of words.

The BERT model architecture is composed of multiple layers of transformer blocks, with the input being a sequence of tokens (e.g., words or subwords) and the output being a contextualized representation of each token in the sequence. The model also includes a pooled output which is used for many down stream task, which are generated by applying a pooling operation over the entire sequence representation.

How does BERT differ from Word2Vec or GloVe?

  • Training objective: The main difference between BERT and Word2Vec/GloVe is the training objective. BERT is trained to predict missing words in a sentence (masked language modeling) and predict the next sentence (next sentence prediction), this way the model learns to understand the context of the words. Word2Vec and GloVe, on the other hand, are trained to predict a word given its context or to predict the context given a word, this way the model learn the association between words.
  • Inputs: BERT takes a pair of sentences as input, and learns to understand the relationship between them, Word2Vec and GloVe only take a single sentence or a window of context words as input.
  • Directionality: BERT is a bidirectional model, meaning that it takes into account the context of a word before and after it in a sentence. This is achieved by training on both the left and the right context of the word. Word2Vec is unidirectional model which can be trained on either the left or the right context, and GloVe is also unidirectional but it is trained on the global corpus statistics.
  • Pre-training: BERT is a pre-trained model that can be fine-tuned on specific tasks, Word2Vec and GloVe are also pre-trained models, but the main difference is that their pre-training is unsupervised with no downstream task, this means that the fine-tuning of BERT can provide better performance on some tasks because it is pre-trained with the task objective in mind.

BERT Architecture

The core component of BERT is a stack of transformer encoder layers, which are based on the transformer architecture introduced by Vaswani et al. in 2017. Each transformer encoder layer in BERT consists of multiple self-attention heads. The self-attention mechanism allows the model to weigh the importance of different parts of the input sequence when generating the representation of each token. This allows the model to understand the relationships between words in a sentence and to capture the context in which a word is used.

The transformer architecture also includes a feed-forward neural network, which is applied to the output of each self-attention head to produce the final representation of each token.

Transformer Encoder Layer

In the transformer encoder layer, for every input token in a sequence, the self-attention mechanism computes key, value, and query vectors, which are used to create a weighted representation of the input. The key, value and query vectors are computed by applying different linear transformations (matrix multiplications) on the input embeddings, these linear transformations are learned during the training process.

In BERT, the input representations are computed by combining multiple embedding layers. The input is first tokenized into word pieces, which is a technique that allows the model to handle out-of-vocabulary words by breaking them down into subword units. The tokenized input is then passed through three embedding layers:

  • Token Embedding: Each word piece is represented as a token embedding
  • Position Embedding: Each word piece is also represented by a position embedding, which encodes information about the position of the word piece in the input sequence.
  • Segment Embedding: BERT also uses a segment embedding to represent the input segments, this embedding helps BERT to distinguish between the two sentences when the input is a pair of sentences.

These embeddings are concatenated or added together to obtain a fixed-length vector representation of each word piece. Special tokens [CLS] and [SEP] are used to indicate the beginning and the end of the input segments and the classification prediction respectively, [CLS] token is used as the representation of the entire input sequence, which is used in the classification tasks, while [SEP] token is used to separate the input segments, in the case of the input being a pair of sentences.

Masked Language Modeling

If you try to predict each word of the input sequence using the training data with cross-entropy loss, the learning task becomes trivial for the network. Since the network knows beforehand what it has to predict, it can easily learn weights to reach a 100% classification accuracy.

The masked language modeling (MLM) approach, also known as the “masked word prediction” task, addresses this problem by randomly masking a portion of the input words (e.g., 15%) during training and requiring the network to predict the original value of the masked words based on the context. By masking a portion of the input words, the network is forced to understand the context of the words and to learn meaningful representations of the input.

In the MLM approach, the network is only required to predict the value of the masked words, and the loss is calculated only over the masked words. This means that the model is not learning to predict words it has already seen, but instead, it is learning to predict words it hasn’t seen while seeing all the context around those words.

In addition to MLM, BERT also uses another objective during pre-training called “next sentence prediction” this objective is a binary classification task that is applied on the concatenation of two sentences, the model is trained to predict whether the second sentence is the real next sentence in the corpus or not. This objective helps BERT to understand the relationship between two sentences and how they are related.

Product Review – Riviera Coconut Milk Kefir

I recently discovered Riviera Coconut Milk Kefir and I am so impressed with its flavor and consistency. As someone who used to enjoy traditional cow and goat milk kefir, I was disappointed when I had to switch to a dairy-free diet and could no longer enjoy my favorite drink. I tried several vegan alternatives but was disappointed with their thickness and lack of sour flavor.

However, I was pleasantly surprised when I found Riviera Coconut Milk Kefir. It has a creamy texture and a tangy, sour flavor that is incredibly close to traditional kefir. It’s the perfect solution for anyone looking for a dairy-free alternative that actually tastes like the real thing.

If you are searching for a dairy-free kefir that is both delicious and reminiscent of traditional kefir, I highly recommend giving Riviera Coconut Milk Kefir a try. It’s a great alternative that has satisfied my cravings for the tangy flavor I love.

They have a plain flavor with no added sugar, that’s the one that I always get. It is made mostly from fermented coconut milk, which makes it high in fat, but the fat does make it creamy. While coconut milk does not contain much protein compared to dairy, there is added fava bean and pea protein. However, I am not sure how that compares to the protein found in dairy kefir in terms of nutrition. Unfortunately I have an autoimmune reaction when I consume any milk proteins, that’s why I have to go with the dairy-free option. There is also calcium added, which is useful for those who are on a dairy-free diet and don’t get calcium from dairy products.

Saccharomyces Boulardii as a supplement

Saccharomyces boulardii is a type of yeast that is commonly used as a probiotic supplement. S. boulardii was first discovered by French microbiologist Henri Boulard in the 1920s, who extracted it from the skin of lychee and mangosteen fruits in Indochina. S. boulardii is a non-colonizing yeast that does not adhere to the gut mucosa, and it is considered a transient microorganism that acts as a probiotic, meaning it can help to promote the growth of beneficial bacteria in the gut. It has been used traditionally as a remedy for diarrhea, and it is now being studied for its potential to help with a variety of other digestive and immune-related conditions such as Irritable Bowel Syndrome (IBS) and Clostridium difficile infections.

It is believed to support the growth of beneficial bacteria in the gut, and may help to improve digestion, boost the immune system, and reduce the risk of certain types of diarrhea. The yeast is typically taken in the form of a capsule or powder, and can be found at most health food stores or online retailers. It is also used in some products like yogurt. It is generally considered safe for most people to use, but it should be avoided by those with compromised immune systems or yeast allergies. It is also not recommended to use it with antibiotics as the yeast will be killed by the antibiotics and it will not provide the intended benefits.

How does it differ from brewer’s yeast

Saccharomyces boulardii and brewer’s yeast are both types of yeast, but they are different strains and have different properties and uses.

Brewer’s yeast is a type of Saccharomyces cerevisiae yeast that is used in the production of beer, bread and other fermented foods. It is also commonly used as a dietary supplement, primarily as a source of B vitamins, minerals and amino acids.

Saccharomyces boulardii, on the other hand, is a non-pathogenic yeast that is not used in brewing or baking. It is specifically used as a probiotic supplement to support gut health, and it is also considered safe for consumption.

While both yeasts are considered to be safe for consumption, S. boulardii is a more targeted supplement and it is generally recommended for specific conditions like diarrhea, whereas brewer’s yeast is a more general supplement with a broader range of benefits.

It is also important to note that some people may have an allergic reaction to brewer’s yeast, while S. boulardii is generally considered safe for most people.

Is there evidence that it could help with IBS?

There is some evidence to suggest that Saccharomyces boulardii may help to alleviate symptoms of Irritable Bowel Syndrome (IBS), a common digestive disorder characterized by abdominal pain, bloating, constipation and/or diarrhea. Studies have shown that taking S. boulardii supplements may help to reduce the frequency and severity of diarrhea in IBS patients. It is also thought to help in regulating the immune system’s response in the gut and improving gut barrier function.

What is a Clostridioides difficile infection and how can saccharomyces boulardii potentially help with C. difficile infection?

Clostridioides difficile (C. difficile) infection, also known as C. diff, is a type of infection that affects the colon. It is caused by a bacterial strain called Clostridioides difficile. The bacteria produce toxins that cause inflammation and damage to the lining of the colon, leading to a range of symptoms such as diarrhea, abdominal pain, and fever. C. difficile infection most commonly occurs in people who have recently taken antibiotics, as these drugs can disrupt the balance of bacteria in the gut, allowing C. diff to overgrow. Other risk factors include being older, having a weakened immune system, and being hospitalized or receiving medical care in a long-term care facility.

Saccharomyces boulardii has been shown to be effective in reducing the symptoms of C. diff infection. It is thought to work by competing with C. diff for nutrients and space in the gut, and by stimulating the production of antibodies and other substances that inhibit the growth of C. diff. Additionally, S. boulardii may also help to restore the balance of beneficial bacteria in the gut, which can be disrupted by C. diff.

Saccharomyces boulardii and anxiety

One study found that treatment with S. boulardii reduced gastrointestinal dysmotility, which is a common symptom of IBS. Gastrointestinal dysmotility refers to a dysfunction of the muscles and nerves in the gastrointestinal (GI) tract that controls the movement of food and waste through the digestive system. This can cause problems with the normal coordinated contractions (peristalsis) of the muscles in the GI tract, leading to abnormal movements and transit of food and waste. The study also found that S. boulardii treatment reduced anxiety-like behavior in the mice, which suggests that it may also have a positive impact on the psychological symptoms of IBS.

How to take Saccharomyces boulardii as a supplement

  • Take the supplement with a meal, as the food can help to protect the probiotic yeast from stomach acid and increase its survival in the gut.
  • Start with a lower dose and gradually increase to the recommended dose over a period of a few days to a week.
  • Follow the recommended dosage on the supplement label
  • Take the supplement consistently, at the same time each day, to maintain a consistent level of S. boulardii in the gut.
  • Continue taking the supplement for at least 2-4 weeks for best results.

It’s important to note that probiotics are considered safe for most people. However, if you have a weakened immune system or a serious illness, it’s always best to consult with a healthcare professional before starting any new supplement regimen.

NLP – Word Embeddings – ELMo

ELMo (Embeddings from Language Models) is a deep learning approach for representing words as vectors (also called word embeddings). It was developed by researchers at Allen Institute for Artificial Intelligence and introduced in a paper published in 2018.

ELMo represents words as contextualized embeddings, meaning that the embedding for a word can change based on the context in which it is used. For example, the word “bank” could have different embeddings depending on whether it is used to refer to a financial institution or the edge of a river.

ELMo has been shown to improve the performance of a variety of natural language processing tasks, including language translation, question answering, and text classification. It has become a popular approach for representing words in NLP models, and the trained ELMo embeddings are freely available for researchers to use.

How does ELMo differ from Word2Vec or GloVe?

ELMo (Embeddings from Language Models) is a deep learning approach for representing words as vectors (also called word embeddings). It differs from other word embedding approaches, such as Word2Vec and GloVe, in several key ways:

  • Contextualized embeddings: ELMo represents words as contextualized embeddings, meaning that the embedding for a word can change based on the context in which it is used. In contrast, Word2Vec and GloVe represent words as static embeddings, which do not take into account the context in which the word is used.
  • Deep learning approach: ELMo uses a deep learning model, specifically a bidirectional language model, to generate word embeddings. Word2Vec and GloVe, on the other hand, use more traditional machine learning approaches based on a neural network (Word2Vec) and matrix factorization (GloVe).

To generate context-dependent embeddings, ELMo uses a bi-directional Long Short-Term Memory (LSTM) network trained on a specific task (such as language modeling or machine translation). The LSTM processes the input sentence in both directions (left to right and right to left) and generates an embedding for each word based on its context in the sentence.

Overall, ELMo is a newer approach for representing words as vectors that has been shown to improve the performance of a variety of natural language processing tasks. It has become a popular choice for representing words in NLP models.

What is the model for training ELMo word embeddings?

The model used to train ELMo word embeddings is a bidirectional language model, which is a type of neural network that is trained to predict the next word in a sentence given the context of the words that come before and after it. To train the ELMo model, researchers at Allen Institute for Artificial Intelligence used a large dataset of text, such as news articles, books, and websites. The model was trained to predict the next word in a sentence given the context of the words that come before and after it. During training, the model learns to represent words as vectors (also called word embeddings) that capture the meaning of the word in the context of the sentence.

Explain in details the bidirectional language model

A bidirectional language model is a type of neural network that is trained to predict the next word in a sentence given the context of the words that come before and after it. It is called a “bidirectional” model because it takes into account the context of words on both sides of the word being predicted.

To understand how a bidirectional language model works, it is helpful to first understand how a unidirectional language model works. A unidirectional language model is a type of neural network that is trained to predict the next word in a sentence given the context of the words that come before it.

A unidirectional language model can be represented by the following equation:

P(w[t] | w[1], w[2], …, w[t-1]) = f(w[t-1], w[t-2], …, w[1])

This equation says that the probability of a word w[t] at time t (where time is the position of the word in the sentence) is determined by a function f of the words that come before it (w[t-1], w[t-2], …, w[1]). The function f is learned by the model during training.

A bidirectional language model extends this equation by also taking into account the context of the words that come after the word being predicted:

P(w[t] | w[1], w[2], …, w[t-1], w[t+1], w[t+2], …, w[n]) = f(w[t-1], w[t-2], …, w[1], w[t+1], w[t+2], …, w[n])

This equation says that the probability of a word w[t] at time t is determined by a function f of the words that come before it and the words that come after it. The function f is learned by the model during training.

In practice, a bidirectional language model is implemented as a neural network with two layers: a forward layer that processes the input words from left to right (w[1], w[2], …, w[t-1]), and a backward layer that processes the input words from right to left (w[n], w[n-1], …, w[t+1]). The output of these two layers is then combined and used to predict the next word in the sentence (w[t]). The forward and backward layers are typically implemented as recurrent neural networks (RNNs) or long short-term memory (LSTM) networks, which are neural networks that are designed to process sequences of data.

During training, the bidirectional language model is fed a sequence of words and is trained to predict the next word in the sequence. The model uses the output of the forward and backward layers to generate a prediction, and this prediction is compared to the actual next word in the sequence. The model’s weights are then updated to minimize the difference between the prediction and the actual word, and this process is repeated for each word in the training dataset. After training, the bidirectional language model can be used to generate word embeddings by extracting the output of the forward and backward layers for each word in the input sequence.

ELMo model training algorithm

  1. Initialize the word vectors:
  • The word vectors are usually initialized randomly using a Gaussian distribution.
  • Alternatively, you can use pre-trained word vectors such as Word2Vec or GloVe.
  1. Process the input sequence:
  • Input the sequence of words w[1], w[2], ..., w[t-1] into the forward layer and the backward layer.
  • The forward layer processes the words from left to right, and the backward layer processes the words from right to left.
  • Each layer has its own set of weights and biases, which are updated during training.
  1. Compute the output:
  • The output of the forward layer and the backward layer are combined to form the final output o[t].
  • The final output is used to predict the next word w[t].
  1. Compute the loss:
  • The loss is computed as the difference between the predicted word w[t] and the true word w[t].
  • The loss function is usually the cross-entropy loss, which measures the difference between the predicted probability distribution and the true probability distribution.
  1. Update the weights and biases:
  • The weights and biases of the forward and backward layers are updated using gradient descent and backpropagation.
  1. Repeat steps 2-5 for all words in the input sequence.

ELMo generates contextualized word embeddings by combining the hidden states of a bi-directional language model (BLM) in a specific way.

The BLM consists of two layers: a forward layer that processes the input words from left to right, and a backward layer that processes the input words from right to left. The hidden state of the BLM at each position t is a vector h[t] that represents the context of the word at that position.

To generate the contextualized embedding for a word, ELMo concatenates the hidden states from the forward and backward layers and applies a weighted summation. The hidden states are combined using a task-specific weighting of all biLM layers. The weighting is controlled by a set of learned weights γ_task and a bias term s_task. The ELMo embeddings for a word at position k are computed as a weighted sum of the hidden states from all L layers of the biLM:

ELMo_task_k = E(R_k; Θtask) = γ_task_L * h_LM_k,L + γ_task{L-1} * h_LM_k,{L-1} + … + γ_task_0 * h_LM_k,0 + s_task

Here, h_LM_k,j represents the hidden state at position k and layer j of the biLM, and γ_task_j and s_task are the task-specific weights and bias term, respectively. The task-specific weights and bias term are learned during training, and are used to combine the hidden states in a way that is optimal for the downstream task.

Using ELMo for NLP tasks

ELMo can be used to improve the performance of supervised NLP tasks by providing context-dependent word embeddings that capture not only the meaning of the individual words, but also their context in the sentence.

To use a pre-trained bi-directional language model (biLM) for a supervised NLP task, the first step is to run the biLM and record the layer representations for each word in the input sequence. These layer representations capture the context-dependent information about the words in the sentence, and can be used to augment the context-independent token representation of each word.

In most supervised NLP models, the lowest layers are shared across different tasks, and the task-specific information is encoded in the higher layers. This allows ELMo to be added to the model in a consistent and unified manner, by simply concatenating the ELMo embeddings with the context-independent token representation of each word.

The model then combines the ELMo embeddings with the context-independent token representation to form a context-sensitive representation h_k, typically using either bidirectional RNNs, CNNs, or feed-forward networks. The context-sensitive representation h_k is then used as input to the higher layers of the model, which are task-specific and encode the information needed to perform the target NLP task. It can be helpful to add a moderate amount of dropout to ELMo and to regularize the ELMo weights by adding a regularization term to the loss function. This can help to prevent overfitting and improve the generalization ability of the model.