Did your psychiatrist talk to you about CYP2D6, SLC6A4, and HTR2A genes and SSRI response?

I think this is a very important topic. There is now sufficient evidence to indicate that people with specific variations of genes CYP2D6, SLC6A4, and HTR2A, are unlikely to respond to SSRIs. The evidence indicates that especially Caucasian females are unlikely to respond to SSRIs, if they have the genes SLC6A4 S/S and HTR2A G/G. Evidence also shows that they may not respond to SNRIs as well.

If you are in this population, I wonder if your psychiatrist spoke to you about this. I think it’s a pretty big deal, given the sufficient evidence for Caucasians.
I am an Eastern European female, and I had no response at all to any SSRIs or SNRIs, or any medication in general so far. I had trials of mirtazapine, sertraline, abilify, latuda, risperidone, olanzapine, fluoxetine, pristiq, cymbalta, and seroquel. I was then referred to a more specialized psychiatric hospital, and they performed genetic testing for me. The results indicated that I have SLC6A4 S/S and HTR2A G/G genes. The medical records state the following:

SLC6A4 S/S
Homozygous for the short promoter polymorphism of the serotonin transporter gene. The short promoter allele is reported to decrease expression of the serotonin transporter compared to the homozygous long promoter allele. The patient may experience a delayed response with selective serotonin reuptake inhibitors, or may benefit from non-selective antidepressants.

HTR2A G/G
Homozygous variant for the G allele for the serotonin receptor type 2a. Two copies of the G allele. This genotype has been associated with an increased risk of adverse drug reactions with certain selective serotonin reuptake inhibitors.

CYP2D6 intermediate metabolizer – Higher plasma concentrations may increase the probability of side effects. Consider a lower starting dose and slower titration schedule as compared with normal metabolizers.

I think given that I have not responded to any of the medications (each one was tried for over 8 weeks), and these test results, it’s pretty clear that I am very unlikely to respond to any other SSRIs or SNRIs. I had a very good neuropsychiatrist at the psychiatric hospital, but unfortunately I was transferred to another hospital due to pregnancy. Now I have a psychiatrist who is a resident, so she does not have a lot of experience. I was prescribed lamotrigine and fluoxetine. I think the lamotrigine makes sense, given that I have no tried it, but she only gave me 25mg per day. I don’t think the fluoxetine makes sense, because it’s an SSRI, and I have already tried it. I also stopped sleeping starting the first day I began to take it. I have been sleeping only 3-4 hours a day since I started it 8 days ago.

I wonder if anyone had a good doctor who discussed with them genetic testing and what were their suggestions? What are the options if there is no response to SSRIs and SNRIs? I don’t think my resident psychiatrist has enough experience in this.

RNN-based encoder – decoder architecture explained

RNN-based Encoder – Decoder

Encoder
An encoder transforms the input data into a different representation, usually a fixed-size context vector. The input data x can be a sequence or a set of features. The encoder maps this input to a context vector c, which is a condensed representation of the input data. Mathematically, this can be represented as:

\textbf{c = f(x)}
 c \text{ - context vector, } x \text{ - input data}

In the case of a sequence, such as a sentence in a language translation task, the encoder might process each element of the sequence (e.g., each word) sequentially. If the encoder is a recurrent neural network (RNN), the transformation f can involve updating the hidden state h at each step:

\boldsymbol{h_t = f(h_{t-1}, x_t)}
 h_t \text{ - hidden state at time } t
 x_t \text{ - input at time } t
 h_{t-1} \text{ - hidden state at time } t-1

The final hidden state h_T​ can be used as the context vector c for the entire input sequence.

Decoder
The decoder takes the context vector c and generates the output data y. In many applications, the output is also a sequence, and the decoder generates it one element at a time. Mathematically, the decoder’s operation can be represented as:

\boldsymbol{y_t = g(y_{t-1}, h_t, c)}
 y_t \text{ - output at time } t
 h_t \text{ - hidden state at time } t

In many sequence-to-sequence models, the decoder is also an RNN, and its hidden state is updated at each step:

\boldsymbol{h_t = g(h_{t-1}, y_{t-1}, c)}


The encoder-decoder framework, particularly in the context of sequence-to-sequence models, is designed to handle sequences of variable lengths both on the input and the output sides.

Output Generation (Decoder)

Initial State: The decoder is initialized with the context vector c as its initial state:

h'_0 = C

Start Token: The decoder receives a start-of-sequence token SOS as its first input y0​.
Decoding Loop: At each step t, the decoder generated an output token y_t and updates its hidden state h’_t.
Variable Length Output: The decoder continues to generate tokens one at a time until it produces an end-of-sequence token EOS. The length of the output sequence Y = (y_1, y_2, …, y_m) is not fixed and can be different from the input length n. The process is as follows:

y_t = Decode(h'_{t-1}, y_{t-1})
h'_t = UpdateState(h'_{t-1}, y_{t-1})
\text{for t = 1 to m, where m can be different from n}

Stopping Criterion: The loop stops when the EOS token is generated, or after producing the maximum allowed length for the output sequence.

The decoder can be also represented using the probability distribution of the next token given the previous tokens and the context vector c from the encoder:

p(y_t|y_{<t}, C)

The full sequence probability is the product of individual token probabilities: the decoder generates a sequence token by token, and the probability of the sequence Y given the context vector C can be described by the chain rule of probability:

p(Y|C) = p(y_1|C) * p(y_2|y_1, C) * ... * p(y_m|y_{<m}, C)

How do we obtain these conditional probabilities?
– For each time step t from 1 to m (m is to be determined):

\bullet \text{The decoder takes the previous hidden state } h'_{t-1} \text{ and the previously generated token } y_{t-1}
\text{ as inputs.}
\bullet \text{The function } f_{theta_{dec}} \text{, parametrized by decoder's weights } \theta_{dec} \text{, computes the current}
\text{ hidden state } h'_t \text{ and the logit vector } l_t \text{, which precedes the probability distribution}
\text{for the next token.}
(h'_{t-1}, y_{t-1}) \xrightarrow{f_{\theta_{dec}}} (l_t, h'_t)
\bullet \text{The logit vector is computed by multiplying the embedded representation of the decoder's}
\text{output by the transposed word embedding matrix}
l_t = W_ey' + b
\bullet \text{The logit vector } l_t \text{ is passed through a softmax layer to obtain the probability}
\text{distribution for the next token } y_t:
p(y_t | y_{<t}, C) = Softmax(l_t)

Token Generation:

\bullet \text{A token is sampled from the probability distribution } p(y_t|y_{<t}, C) \text{, which becomes}
\text{the next token in the sequence } y_t
\bullet \text{This token is then used as the input for the next time step}

Sequence Continuation:
– This process repeats, with the decoder generating one token at a time, updating its hidden state, and adjusting the probability distributions for subsequent tokens based on the current sequence.

Stopping Criterion:
– The loop continues until the decoder generates an EOS token, indicating the end of the sequence, or until it reaches a predefined maximum sequence length.

Using Objective/Technical Reading as a Tool Against Depressive Rumination

I have been diagnosed with clinical depression since 2015, it’s been on and off. Because of this diagnosis, I naturally became interested in medical and talk therapy treatments for depression. In grad school, I had the opportunity to work with a dataset of Facebook posts of users who also had labels as depressed and non-depressed, based on the standard clinical questionnaire.

Using natural language processing (NLP) techniques, one of my findings was that depressed people use more personal pronouns in their text, such as “I”, “he”, “she”, and “we”. For instance, I noticed in my own experiences that when I am more depressed, I tend to ruminate more—thinking about how “I” am unlucky not to have many relatives, or how it’s unfair that he/she (some person that I know) is smarter or has a better job or a better life.

I found a skill that helps manage these thoughts. When I catch myself ruminating, I try to engage in reading something technical or objective that doesn’t involve personal pronouns or comparisons or human relationships in general. For example, I might read an article about Python vs Julia, or why high blood sugar is dangerous, or where turtles go in winter in Ontario. I find that even if the ruminative thoughts continue, forcing myself to read and focus on these kinds of articles can help prevent my ruminative thoughts from escalating.

I am not sure what type of skills this could be called – CBT or DBT, but I think it relates more to the DBT skill of “opposite action”. This skills is based on doing the opposite of what our emotions/mind is telling us to do. So if my mind is telling me to sit and ruminate about my life, myself, myself vs. others, I do the opposite – read something that doesn’t involve any personal life at all.

Reset my supplements and caffeine, started to feel better!

I was feeling better in the summer, I was swimming a lot, hiking, being out in the sun. Then it started getting colder, I was out less, I also started applying for jobs – so I was sitting a lot in front of my laptop. I became stressed because I have received only a few replies despite multiple applications. I am also trying to have a child and it hasn’t been working, so feeling upset about that as well.
I decided to try and feel better – started taking NAC – 600-1200 mg in the evening, started drinking more coffee, consuming yogurt and kefir for probiotics. Well then in the last several weeks I started to feel even worse. Very severe brain fog, as if I am not sure whether I am participating in life or just observing it and it’s happening to someone else. I felt wrapped in gray fog and as is everything was outside the fog, at a distance from me. I also started to feel dizzy.

I’m glad that I remembered that this happened before when I added probiotic supplements and 5-HTP to “feel better”. I actually ended up with a psychotic episode, pretty sure that it was caused by 5-HTP.

So I decided to reset everything – instead of adding more supplements, I stopped all of them. Stopped taking NAC, stopped eating kefir and yogurt. Alsocurrently not consuming anything with lactose or a lot of sugar. Just eating regular healthy food – lentils, vegetables, chicken, salmon, brown rice, etc. Stopped coffee in the morning, only started having one coffee a day in the afternoon (and making it half decaf).

Well, I am actually feeling better now!

I think what happened is that I naturally felt worse as the weather got colder, which is normal, I have added stress from not getting replies to my applications and fertility issues, which normally makes one feel worse! So I then suddenly added all of these supplements + stimulants (more caffeine), and ended up feeling just as bad as I was, plus brain fog!

Now I am feeling better in terms of brain fog and I am trying to just use CBT to deal with my situation, instead of supplements.

Not saying that supplements can’t help, it’s very personal, but just wanted to share my story -that sometimes adding several supplements + more caffeine can actually cause brain fog / depersonalization.

Percentile Confidence Interval Calculator

The calculation makes use of the binomial distribution properties, making an assumption that our data can be modeled by a binomial distribution. This assumption may not always be accurate, especially for continuous data, but it provides an approximation for our purposes.

Assumptions

1. Binary Outcome: The fundamental assumption behind the binomial distribution is that there is a binary outcome, often termed as ‘success’ and ‘failure’. In the context of percentiles, you can think of ‘success’ as the instances below the percentile and ‘failure’ as the instances above.

2. Fixed Number of Trials: For the binomial distribution, there is a fixed number n of trials. In our case, n represents the total number of data points in our sample.

3. Independence: Each trial (or data point) is independent of others. This means the outcome of one trial does not affect the outcome of another.

4. Constant Probability of Success: The probability of success, q, is the same for each trial. Here, q represents the percentile value. For example, for the 70th percentile, q=0.7.

Why the Binomial Distribution?

The rationale behind using the binomial distribution for percentile confidence intervals is its direct applicability to cases where you’re looking at the proportion of observations below a certain threshold (i.e., a percentile).

When you’re asking about the 70th percentile, you’re essentially inquiring: “What’s the value below which 70% of my data falls?” This can be likened to asking about the number of successes in n trials, where a success is an observation below the desired threshold.

However, it’s important to note that this method provides an approximation. The binomial distribution is discrete and inherently based on counting successes in a set number of trials, while percentiles often come from continuous distributions and may not perfectly adhere to the assumptions above.

import numpy as np
from scipy.stats import binom
import seaborn as sns

Get some data

# Load the Iris dataset
iris = sns.load_dataset("iris")
# Use the 'sepal_length' feature
data = iris['sepal_length'].values

print(data[:50])

[5.1 4.9 4.7 4.6 5.  5.4 4.6 5.  4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1
 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.  5.  5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.
 5.5 4.9 4.4 5.1 5.  4.5 4.4 5.  5.1 4.8 5.1 4.6 5.3 5. ]

Calculate the 70th percentile

# Calculate the 70th percentile
percentile_70 = np.percentile(data, 70)
print("Min: %f, Max: %f, 70th percentile: %f" % (min(data), max(data), percentile_70))

Min: 4.300000, Max: 7.900000, 70th percentile: 6.300000

Convert the data to “success” (above the 70th percentile) and “failure”

successes = np.sum(data > percentile_70)
failures = len(data) - successes

# Now, `successes` is analogous to `q * n` in the binomial scenario.
# So, we can set:
n = len(data)
q = successes / n

print("n: %d, q: %f" % (n, q))

n: 150, q: 0.280000

Calculate the 95% confidence interval

The code calculates potential upper (u) and lower (l) bounds for a confidence interval using the binomial distribution’s percent-point function (ppf).

np.ceil(binom.ppf(1 – alpha / 2, n, q)) determines the approximate upper bound for the confidence interval and np.ceil(binom.ppf(alpha / 2, n, q)) for the lower bound.

+ np.arange(-2, 3) extends these bounds by adding an array of [-2, -1, 0, 1, 2], generating a set of potential boundaries around the original estimate.

u gives a sequence of indices in the dataset that demarcate the upper bound of the confidence interval. It starts from the calculated index for the 97.5th percentile and provides two more indices above and two below it.

l gives a sequence of indices in the dataset that demarcate the lower bound of the confidence interval. It starts from the calculated index for the 2.5th percentile and provides two more indices above and two below it.

alpha = 0.05
u = np.ceil(binom.ppf(1 - alpha / 2, n, q)) + np.arange(-2, 3)
u[u > n] = np.inf

l = np.ceil(binom.ppf(alpha / 2, n, q)) + np.arange(-2, 3)
l[l < 0] = -np.inf

print("u: " + ", ".join(map(str, u)))
print("l: " + ", ".join(map(str, l)))

u: 51.0, 52.0, 53.0, 54.0, 55.0
l: 29.0, 30.0, 31.0, 32.0, 33.0
sorted_data = np.sort(data)

# Extract values corresponding to the indices
# Correct way to interpret the u and l values
u_values = sorted_data[n - u.astype(int)]
l_values = sorted_data[l.astype(int) - 1]

print("Upper values:", u_values)
print("Lower values:", l_values)

Upper values: [6.3 6.2 6.2 6.2 6.2]
Lower values: [5.  5.  5.  5.  5.1]

Probability coverage

The code calculates the probability coverage of different combinations of potential confidence intervals formed by the lower bounds (l) and upper bounds (u)Coverage is a matrix of probabilities. The goal is to find the smallest confidence interval that guarantees coverage of at least 1−α.

coverage = np.zeros((len(l), len(u)))
for i, a in enumerate(l):
    for j, b in enumerate(u):
        coverage[i, j] = binom.cdf(b - 1, n, q) - binom.cdf(a - 1, n, q)

if np.max(coverage) < 1 - alpha:
    i = np.where(coverage == np.max(coverage))
else:
    i = np.where(coverage == np.min(coverage[coverage >= 1 - alpha]))

print("Coverage Matrix:")
print(coverage)

print("\nOptimal Indices (i_l, i_u):")
print(i)

Coverage Matrix:
[[0.93135214 0.95028522 0.96430299 0.97438285 0.98142424]
 [0.92730647 0.94623955 0.96025732 0.97033718 0.97737857]
 [0.92096076 0.93989385 0.95391161 0.96399148 0.97103286]
 [0.91140808 0.93034117 0.94435894 0.9544388  0.96148018]
 [0.89759319 0.91652627 0.93054404 0.9406239  0.94766529]]

Optimal Indices (i_l, i_u):
(array([0], dtype=int64), array([1], dtype=int64))
i_l = i[0][0]
i_u = i[1][0]
print("Chosen row of coverage matrix: %d, chosen column of coverage matrix: %d" % (i_l, i_u))

u_final = min(n, u[i_u])
u_final = max(0, int(u_final)-1)
        
l_final = min(n, l[i_l])
l_final = max(0, int(l_final)-1)

# Actual value corresponding to u_final and l_final
upper_value_threshold = n - u_final
lower_value_threshold = l_final

upper_value = sorted_data[upper_value_threshold]
lower_value = sorted_data[lower_value_threshold]

print("Lower bound value:", lower_value)
print("Upper bound value:", upper_value)

Chosen row of coverage matrix: 0, chosen column of coverage matrix: 1
Lower bound value: 5.0
Upper bound value: 6.3
import matplotlib.pyplot as plt

# Plotting the histogram
plt.figure(figsize=(10, 6))
plt.hist(data, bins=30, color='skyblue', edgecolor='black', alpha=0.7, label='Data')

# Adding vertical lines for lower_value and upper_value
plt.axvline(lower_value, color='red', linestyle='--', label='Lower bound')
plt.axvline(upper_value, color='green', linestyle='--', label='Upper bound')

# Adding vertical line for the 70th percentile
plt.axvline(percentile_70, color='purple', linestyle='-.', label='70th Percentile')

# Adding title and labels
plt.title('Histogram of Data with Confidence Bounds and 70th Percentile')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.legend()

plt.show()

Bootstrap method

A commonly used alternative method to calculate confidence intervals for percentiles (also known as quantiles) is the Bootstrap method.

The Bootstrap method involves resampling the dataset multiple times with replacement and then computing the desired statistic (in this case, the 70th percentile) for each of these resampled datasets. This gives a distribution of the 70th percentiles from which we can compute the confidence interval.

lower: This represents the value below which the bottom 2.5% of your jotted down 70th percentiles fall. In other words, it’s like saying, “In 2.5% of our bootstrap ‘experiments,’ the 70th percentile was below this value.”

upper: This is the value below which the bottom 97.5% of your jotted down 70th percentiles fall. Put another way, “In 97.5% of our bootstrap ‘experiments,’ the 70th percentile was below this value.”

import numpy as np

def bootstrap_percentile_CI(data, percentile=70, alpha=0.05, B=10000):
    """Calculate the bootstrap confidence interval for a given percentile."""
    n = len(data)
    resampled_percentiles = []

    for _ in range(B):
        resample = np.random.choice(data, n, replace=True)
        resampled_percentiles.append(np.percentile(resample, percentile))

    lower = np.percentile(resampled_percentiles, 100 * alpha/2)
    upper = np.percentile(resampled_percentiles, 100 * (1-alpha/2))
    
    return lower, upper

# Calculate the bootstrap 70th percentile confidence interval
lower_bootstrap, upper_bootstrap = bootstrap_percentile_CI(data)
print("Bootstrap 70th percentile CI: (%.2f, %.2f)" % (lower_bootstrap, upper_bootstrap))

Bootstrap 70th percentile CI: (6.10, 6.43)
# Plotting
plt.hist(data, bins=30, color='lightblue', edgecolor='black', alpha=0.7)
plt.axvline(x=np.percentile(data, 70), color='green', linestyle='--', label="True 70th Percentile")
plt.axvline(x=lower_bootstrap, color='red', linestyle='--', label="Lower Bound of CI")
plt.axvline(x=upper_bootstrap, color='blue', linestyle='--', label="Upper Bound of CI")
plt.legend()
plt.title('Histogram of Sepal Length with Bootstrap CI for 70th Percentile')
plt.xlabel('Sepal Length')
plt.ylabel('Frequency')
plt.show()

Discussion

The bootstrap method makes minimal assumptions about the distribution of the data, making it versatile for a wide variety of datasets. This flexibility allows the bootstrap to handle complex or unknown data distributions, whereas the binomial method assumes data follows a binomial distribution and is mainly suited for binary outcomes. While the binomial approach is computationally simpler and quicker, it might not always provide an accurate representation, especially if the underlying assumptions aren’t met. In contrast, the bootstrap can be more computationally intensive due to resampling but offers the advantage of being more adaptable and often provides a more accurate estimate for datasets that don’t strictly adhere to a binomial distribution.

Plant of the Day – Common Haircap Moss

Common haircap moss is also known as Polytrichum commune. This moss is often found in Ontario forests, and I find it very beautiful because it looks like miniature pine trees. This particular photo below I took in the Sheffield Conservation Area. Common haircap moss has tall and upright stems, so it looks quite different from most other moss types.


Common haircap moss is one of most widespread mosses, found on all continents, including Antarctica. It can thrive in a variety of habitats, from acidic bogs to well-drained sandy soils. Like other mosses, common haircap moss reproduces via spores rather than seeds. These spores are released from capsules that rise high above the leafy stems. Mosses, including the common haircap moss, have a unique and complex life cycle that involves alternating between two generations: the gametophyte and the sporophyte. Mosses reproduce using spores that form in capsules on the sporophyte generation. When mature, these capsules release spores into the environment. Upon landing in suitable conditions, the spores germinate and grow into the gametophyte generation, which is the leafy structure commonly recognized as moss. These leafy structures produce male and female reproductive organs. When conditions are moist, the male organs release sperm that swims to fertilize the eggs in the female organs. After fertilization, a new sporophyte grows from the gametophyte. This sporophyte matures and produces a capsule filled with spores.

So Mosses don’t require pollination in the same way that flowering plants do! Mosses depend on water for fertilization. The male structures release sperm only when the conditions are moist enough. Mosses can often fertilize themselves if both male and female reproductive structures are present on the same plant. However, cross-fertilization between different plants can also occur, especially in dioicous species where individual plants are either male or female.

The leaves of common haircap moss have structures called lamellae that increase the surface area for photosynthesis. The leaves of the common haircap moss have a distinct structure. They possess a central nerve, or midrib, which is less common in moss species. This central nerve provides structural support to the leaf, helping it stand upright. Additionally, it plays a role in the transport of water within the leaf. Unlike vascular plants, mosses don’t have a system of vessels to move water. Mosses, including the common haircap moss, absorb water directly through their leaves and stems rather than through specialized roots like many other plants. Instead of true roots, mosses have hair-like structures called rhizoids. These rhizoids anchor the moss to its substrate (like soil or bark) and also absorb water and some nutrients. However, the primary function of rhizoids is anchorage rather than absorption. When water is scarce, mosses can dry out and enter a dormant state. Once water is available again, they can quickly rehydrate and resume their metabolic activities.

Plant of the Day – Ghost Pipe

Ghost Pipe or Indian Pipe or Monotropa uniflora – it is a non-photosynthetic flowering plant. It’s neither a fungus nor a typical flower in terms of how it obtains its food, making it a particularly intriguing member of the plant world. It can be found in various parts of North America, from Alaska to California in the west and from Nova Scotia to Florida in the east. It also grows in parts of Asia, such as in the Himalayas and northeastern Siberia. It prefers shaded, moist, mature forests, especially those with abundant leaf litter and decaying organic matter, which supports the mycorrhizal fungi it relies on. The plant often grows under broadleaf trees like oaks and pines, benefiting from the fungi that form mutualistic relationships with these trees.

An individual Indian pipe plant is ephemeral in nature. After it emerges from the ground, it flowers, sets seed, and then decays over a relatively short period, often within a week or two. The perennial part of the plant, its underground structures (like rhizomes), can live for several years. These structures can give rise to above-ground shoots annually for many years, under suitable conditions.

The photo below was taken by me recently this August. This Ghost Pipe was found in the beautiful Meisel Woods Conservation Area, Ontario, Canada.

  • Lacks Chlorophyll: Unlike most plants, Monotropa uniflora lacks chlorophyll, the green pigment that enables photosynthesis. This absence gives it its distinctive pale or translucent appearance.
  • Mycotrophic Nutrition: Since it can’t produce its own food through photosynthesis, Monotropa uniflora obtains its nutrients through a unique relationship with mycorrhizal fungi. These fungi have a mutualistic relationship with certain trees, exchanging nutrients with them. So, indirectly, Monotropa uniflora is obtaining nutrients from the trees via the fungi. In essence, it “eats” by parasitizing the fungi that are associated with tree roots.
  • Fungi Partners: The primary group of fungi that associate with Monotropa uniflora are from the Russulaceae family. This includes fungi that are ectomycorrhizal with trees, forming a sheath around tree roots where nutrient exchange occurs.
  • Reproduction: It’s a flowering plant, so it reproduces via seeds. The flowers are typically pollinated by bees. After pollination, the flower stands erect from its previously nodding position. The seeds of Monotropa uniflora contain very little energy reserve, so they require the presence of the right fungus to germinate. Once a seed comes into contact with a compatible fungus, the fungus will invade the seed and initiate the growth of the plant.
  • Indirect Relationship with Trees: The ectomycorrhizal fungi have a symbiotic relationship with trees. In this mutualistic relationship:
    – Trees provide the fungi with sugars produced through photosynthesis.
    – The fungi provide trees with essential minerals and water from the soil.
    – Monotropa uniflora exploits this system by drawing nutrients from the fungi, which indirectly means it’s obtaining resources from the trees, although it doesn’t directly parasitize the trees themselves.
  • Dependence on the Fungi-Tree Relationship: It’s essential to understand that without the fungi-tree relationship, Monotropa uniflora couldn’t survive. The plant is entirely dependent on the organic compounds produced by photosynthetic plants (trees) and transferred through the mycorrhizal fungi.

In the relationship between Monotropa uniflora the fungi it associates with, it might seem that the fungi get the short end of the stick since the Indian pipe is effectively parasitizing the fungus. However, when looking at this association in the broader context of the forest ecosystem, some potential indirect benefits or considerations arise:

Seed Germination and Fungal Propagation: The initial interaction between the fungi and the Indian pipe seeds can promote fungal colonization. The process of seed germination and growth might offer conditions conducive for the fungal hyphae to spread and grow.

Promotion of Fungal Diversity: Interactions with plants like Monotropa uniflora might play a role in promoting fungal diversity in forest ecosystems. A diverse mycorrhizal network can enhance soil health and overall forest resilience.

Fructose Malabsorption – Applying the Luhn algorithm for text summarization

The Luhn algorithm is a text summarization technique that uses statistical properties of the text to identify and extract the most important sentences from a document. The algorithm was developed by H.P. Luhn in the 1950s, and is still widely used in various forms today.

The Luhn algorithm works by first analyzing the frequency of each word in the document, and then assigning a score to each sentence based on the frequency of the words it contains. Sentences that contain words that are more frequent in the document as a whole are considered to be more important, and are assigned higher scores. The algorithm then selects the top-scoring sentences and concatenates them together to form the summary. The length of the summary is usually determined in advance by the user, and the algorithm selects the most important sentences that fit within that length limit.

It works by identifying the most salient or important sentences in a document based on the frequency of important words and their distribution within each sentence. First, the algorithm removes stopwords, which are common words such as “the”, “and”, and “a” that do not carry much meaning. Additionally, one could apply stemming, which reduces words to their base or root form. For example, “likes” and “liked” are reduced to “like”. Then, the algorithm looks for important words in each sentence. These are typically nouns, verbs, and adjectives that carry the most meaning. The specific method for identifying important words may vary depending on the implementation of the algorithm, but in general, they are selected based on their frequency and relevance to the topic of the text.

The algorithm counts the number of important words in each sentence and divides it by the span, or the distance between the first and last occurrence of an important word. This gives a measure of how densely the important words are distributed within the sentence. Finally, the algorithm ranks the sentences based on their scores, with the highest scoring sentences considered the most important and selected for the summary.

Here are the step-by-step instructions for the Luhn algorithm:

  1. Preprocess the text: Remove any stop words, punctuation, and other non-textual elements from the document, and convert all the remaining words to lowercase.
  2. Calculate the word frequency: Count the number of occurrences of each word in the document, and store this information in a frequency table.
  3. For each sentence, calculate the score by:
    a. Identifying the significant words (excluding stop words) that occur in the sentence.
    b. Ordering the significant words by their position in the sentence.
    c. Determining the distance between adjacent significant words (the “span”).
    d. Calculating a score for the sentence as the sum of the square of the number of significant words divided by the span for each adjacent pair of significant words.
  4. Select the top-scoring sentences: Sort the sentences in the document by their score, and select the top-scoring sentences up to a maximum length L. The length L is typically chosen by the user in advance, and represents the maximum number of words or sentences that the summary can contain.
  5. Generate the summary: Concatenate the selected sentences together to form the summary.

Below I summarize the topic of fructose malabsorption by generating a summary using the Luhn algorithm. To create the summary, I selected several articles from sources like Wikipedia and PubMed. The important words were selected based on their total frequency in all of the text. I chose the top 25 words to focus on, and then used the algorithm to identify the most important sentences based on the frequency and distribution of these words. The summary was generated using the top 15 sentences.

Symptoms and signs of Fructose malabsorption may cause gastrointestinal symptoms such as abdominal pain, bloating, flatulence or diarrhea. Although often assumed to be an acceptable alternative to wheat, spelt flour is not suitable for people with fructose malabsorption, just as it is not appropriate for those with wheat allergies or celiac disease. However, fructose malabsorbers do not need to avoid gluten, as those with celiac disease must. Many fructose malabsorbers can eat breads made from rye and corn flour. This can cause some surprises and pitfalls for fructose malabsorbers. Foods (such as bread) marked “gluten-free” are usually suitable for fructose malabsorbers, though they need to be careful of gluten-free foods that contain dried fruit or high fructose corn syrup or fructose itself in sugar form. Food-labeling Producers of processed food in most or all countries, including the US, are not currently required by law to mark foods containing “fructose in excess of glucose”.

Stone fruit: apricot, nectarine, peach, plum (caution – these fruits contain sorbitol);Berry fruit: blackberry, boysenberry, cranberry, raspberry, strawberry, loganberry; Citrus fruit: kumquat, grapefruit, lemon, lime, mandarin, orange, tangelo; Other fruits: ripe banana, jackfruit, passion fruit, pineapple, rhubarb, tamarillo.

The fructose and glucose contents of foods listed on the Australian food standards would appear to indicate that most of the listed foods have higher fructose levels.

Glucose enhances absorption of fructose, so fructose from foods with fructose-to-glucose ratio <1, like white potatoes, are readily absorbed, whereas foods with fructose-to-glucose ratio >1, like apples and pears, are often problematic regardless of the total amount of fructose in the food.

Foods with high fructose-to-glucose ratio.

Dietary supplements of xylose isomerase may improve some symptoms of fructose malabsorption, although there is currently only a single scientific study available.

This can lead to issues such as bloating, heartburn and constipation.


It is called a hydrogen breath test and is the method currently used for a clinical diagnosis.

People with fructose malabsorption absorb less than 25 g per sitting.

Fructose malabsorption should be considered in patients with symptoms of depression and disturbances of tryptophan metabolism. This prospective cross-sectional study aimed to investigate the potential association between primary-adult lactose malabsorption, fructose malabsorption, tryptophan (TRP) metabolism and the presence of depressive signs.

High intestinal fructose concentration seems to interfere with L-tryptophan metabolism, and it may reduce availability of tryptophan for the biosynthesis of serotonin (5-hydroxytryptamine).

After obtaining the top sentences above, I used them as input into ChatGPT to generate another summary. The summary ended up with some mistakes, such as “People with this condition should avoid foods like spelt flour that contain fructose.“, as spelt does not contain fructose, it contains fructans. But it was still a fun exercise.

Fructose malabsorption can cause gastrointestinal symptoms such as abdominal pain, bloating, flatulence, and diarrhea. People with this condition should avoid foods like spelt flour that contain fructose. However, they do not need to avoid gluten, and many can eat breads made from rye and corn flour. Foods marked “gluten-free” are generally suitable for fructose malabsorbers, but they need to be careful of gluten-free foods that contain dried fruit or high fructose corn syrup. Certain fruits such as apricots, nectarines, peaches, and plums can cause issues due to their sorbitol content. The fructose-to-glucose ratio of foods can also play a role in absorption, with foods with a higher ratio often being problematic. A hydrogen breath test is used for clinical diagnosis, and dietary supplements of xylose isomerase may help with symptoms. Fructose malabsorption may also be associated with depressive signs due to the interference with L-tryptophan metabolism.

Filling gluten-free breakfast muffins

Kitchen utensils

I have tried out various kitchen utensils, including muffin pans, loaf pans, a mixer, and a blender, among others. While these items can be useful for baking and food preparation, I found that many of them took too long to clean, making them impractical for everyday use. As a result, I ended up donating all of the utensils that I wasn’t using. The only items that I found to be useful and easy to clean are silicone baking sheets and silicone muffin cups. These utensils are made from a non-stick material that eliminates the need for oil or butter and silicone material also distributes heat evenly.

Silicone baking trays/sheets are primarily made of silicone, which is a synthetic polymer material composed of silicon, oxygen, carbon, and hydrogen. However, some silicone baking trays/sheets may contain additional materials, such as fiberglass or nylon, to provide additional strength and durability.

Fiberglass is sometimes added to the silicone material to reinforce the tray/sheet and prevent it from bending or warping during use. This is especially important for larger or thicker trays/sheets, which may be more prone to deformation under high temperatures or heavy use. Nylon is another material that may be added to the silicone to provide additional durability and heat resistance. Nylon-reinforced silicone trays/sheets are often used in commercial kitchens, as they are more resistant to wear and tear and can withstand high-volume use.

When exposed to heat, silicone responds by retaining its shape and structure without melting or warping. This means that it can be used in high-temperature environments without the risk of deformation or damage. Silicone is also an excellent insulator, which means that it helps to distribute heat evenly across the surface of the tray/sheet.

The other useful utensils that I regulary use are metal bowls for mixing, a metal whisk, and silicone spatulas.

Breakfast muffins


The muffins that I came up with require two bowls for mixing, a metal whisk, a spoon, and silicone muffin cups. These muffins are gluten-free. They are made from gluten-free oats, oat flour and or sorghum flour. I also add flax seeds and chia seeds for fiber and nutrition. Sweetness comes from ripe bananas, apple sauce, and honey. I find these muffins very filling, sometimes my breakfast consists of just several muffins and a coffee with oat milk. The recipe also includes two eggs. Eggs, flax seeds, and chia seeds, do contain protein. The flour contains carbs. The eggs and butter add fat. So I would say this is a balanced breakfast. The recipe is below:

Bowl 1 – dry ingredients:
1 1/2 cups gluten-free oats
1 1/2 cups oat/sorghum flour
1/4 cup ground flax seeds
1 teaspoon chia seeds
1/4 teaspoon salt
1 teaspoon xanthan gum
3/4 teaspoon baking soda

Mix together all of the dry ingredients in bowl 1

Bowl 2 – wet ingredients:
2 mashed ripe bananas (start with this step first – mash the bananas in bowl 2)
1/2 cup apple sauce
5 tablespoons of melted butter/vegan butter
3 tablespoons peanut butter
3 tablespooons honey
1 cup blueberries

Mix together all of the wet ingredients with a metal whisk, starting with mashing the ripe bananas first. Make sure that everything that you mix is at room temperature. Add blueberries last.

Pour the mixture from bowl 2 into bowl 1, again, mix everything together. Let the final mixture stand for 15 minutes at room temperature. While the mixture is standing, you can turn on the oven to 350 F, so that it starts preheating.

Place silicone muffins cups on a tray. After 15 minutes pass, the dough is ready, use a spoon to pour the mixture into the silicone muffin cups. Place the tray with the muffin cups into the oven. Bake at 350 F for 45 minutes.

Sequence-to-Sequence and Attention

What are Sequence-to-Sequence models?

Sequence-to-Sequence (Seq2Seq) models are a type of neural network architecture used for natural language processing tasks, such as machine translation, text summarization, and conversational modeling. The basic idea behind Seq2Seq models is to map a variable-length input sequence to a variable-length output sequence.

Seq2Seq models consist of two parts: an encoder and a decoder. The encoder takes an input sequence, such as a sentence, and generates a fixed-length representation of it, called the context vector. The decoder then takes the context vector as input and generates the output sequence, such as a translation of the input sentence into another language. Both encoder and decoder contain multiple recurrent units that take one element as input. The encoder processes the input sequence one word at a time and generates a hidden state h_i for each timestep i. Finally, it passes the last hidden state h_n to the decoder, which uses it as the initial state to generate the output sequence.

In a Seq2Seq model, the hidden state refers to the internal representation of the input sequence that is generated by the recurrent units in the encoder or decoder. The hidden state is a vector of numbers that represents the “memory” of the recurrent unit at each timestep.

Let’s consider a simple recurrent unit, such as the Long Short-Term Memory (LSTM) cell. An LSTM cell takes as input the current input vector x_t and the previous hidden state h_{t-1}, and produces the current hidden state h_t as output. The LSTM cell can be represented mathematically as follows:


Here, W_{ix}, W_{ih}, W_{fx}, W_{fh}, W_{ox}, W_{oh}, W_{cx}, and W_{ch} are weight matrices, b_i, b_f, b_o, and b_c are bias vectors, sigmoid is the sigmoid activation function, and tanh is the hyperbolic tangent activation function.

At each timestep t, the LSTM cell computes the input gate i_t, forget gate f_t, output gate o_t, and cell state c_t based on the current input x_t and the previous hidden state h_{t-1}. The current hidden state h_t is then computed based on the current cell state c_t and the output gate o_t. In this way, the hidden state h_t represents the internal memory of the LSTM cell at each timestep t. It contains information about the current input x_t as well as the previous inputs and hidden states, which allows the LSTM cell to maintain a “memory” of the input sequence as it is processed by the encoder or decoder.

Encoder and decoder

The Seq2Seq model consists of two parts: an encoder and a decoder. Both of these parts contain multiple recurrent units that take one element as input. The encoder processes the input sequence one word at a time and generates a hidden state h_i for each timestep i. Finally, it passes the last hidden state h_n to the decoder, which uses it as the initial state to generate the output sequence.

The final hidden state of the encoder represents the entire input sequence as a fixed-length vector. This fixed-length vector serves as a summary of the input sequence and is passed on to the decoder to generate the output sequence. The purpose of this fixed-length vector is to capture all the relevant information about the input sequence in a condensed form that can be easily used by the decoder. By encoding the input sequence into a fixed-length vector, the Seq2Seq model can handle input sequences of variable length and generate output sequences of variable length.

The decoder takes the fixed-length vector representation of the input sequence, called the context vector, and uses it as the initial hidden state s_0 to generate the output sequence. At each timestep t, the decoder produces an output y_t and an updated hidden state s_t based on the previous output and hidden state. This can be represented mathematically using linear algebra as follows:


Here, W_s, U_s, and V_s are weight matrices, b_s is a bias vector, c is the context vector (from the encoder), and f and g are activation functions. The decoder uses the previous output y_{t-1} and hidden state s_{t-1} as input to compute the updated hidden state s_t, which depends on the current input and the context vector. The updated hidden state s_t is then used to compute the current output y_t, which depends on the updated hidden state s_t. By iteratively updating the hidden state and producing outputs at each timestep, the decoder can generate a sequence of outputs that is conditioned on the input sequence and the context vector.

What is the context vector, where does it come from?

In a Seq2Seq model, the context vector is a fixed-length vector representation of the input sequence that is used by the decoder to generate the output sequence. The context vector is computed by the encoder and is passed on to the decoder as the final hidden state of the encoder.

What is a transformer? How is are decoders encoders used in transformers?

The Transformer architecture consists of an encoder and a decoder, similar to the Seq2Seq model. However, unlike the Seq2Seq model, the Transformer does not use recurrent neural networks (RNNs) to process the input sequence. Instead, it uses a self-attention mechanism that allows the model to attend to different parts of the input sequence at each layer.

In the Transformer architecture, both the encoder and the decoder are composed of multiple layers of self-attention and feedforward neural networks. The encoder takes the input sequence as input and generates a sequence of hidden representations, while the decoder takes the output sequence as input and generates a sequence of hidden representations that are conditioned on the input sequence and previous outputs.

Traditional Seq2Seq vs. attention-based models

In traditional Seq2Seq models, the encoder compresses the input sequence into a single fixed-length vector, which is then used as the initial hidden state of the decoder. However, in some more recent Seq2Seq models, such as the attention-based models, the encoder computes a context vector c_i for each output timestep i, which summarizes the relevant information from the input sequence that is needed for generating the output at that timestep.

The decoder then uses the context vector c_i along with the previous hidden state s_i-1 to generate the output for the current timestep i. This allows the decoder to focus on different parts of the input sequence at different timesteps and generate more accurate and informative outputs.

The context vector c_i is computed by taking a weighted sum of the encoder’s hidden states, where the weights are learned during training based on the decoder’s current state and the input sequence. This means that the context vector c_i is different for each output timestep i, allowing the decoder to attend to different parts of the input sequence as needed. The context vector c_i can be expressed mathematically as:


where i is the current timestep of the decoder and j indexes the hidden states of the encoder. The attention weights α_ij are calculated using an alignment model, which is typically a feedforward neural network (FFNN) parametrized by learnable weights. The alignment model takes as input the previous hidden state s_i-1 of the decoder and the current hidden state h_j of the encoder, and produces a scalar score e_ij:

where a is the alignment model. The scores are then normalized using the softmax function to obtain the attention weights α_ij:

where k indexes the hidden states of the encoder.

The attention weights α_ij reflect the importance of each hidden state h_i with respect to the previous hidden state s_i-1 in generating the output y_i. The higher the attention weight α_ij, the more important the corresponding hidden state h_i is for generating the output at the current timestep i. By computing a context vector c_i as a weighted sum of the encoder’s hidden states, the decoder is able to attend to different parts of the input sequence at different timesteps and generate more accurate and informative outputs.

The difference between context vector in Seq2Seq and context vector in attention

In a traditional Seq2Seq model, the encoder compresses the input sequence into a fixed-length vector, which is then used as the initial hidden state of the decoder. The decoder then generates the output sequence word by word, conditioned on the input and the previous output words. The fixed-length vector essentially contains all the information of the input sequence, and the decoder needs to rely solely on it to generate the output sequence. This can be expressed mathematically as:

c = h_n

where c is the fixed-length vector representing the input sequence, and h_n is the final hidden state of the encoder.

In an attention-based Seq2Seq model, the encoder computes a context vector c for each output timestep i, which summarizes the relevant information from the input sequence that is needed for generating the output at that timestep. The context vector is a weighted sum of the encoder’s hidden states, where the weights are learned during training based on the decoder’s current state and the input sequence.

The attention mechanism allows the decoder to choose which aspects of the input sequence to give attention to, rather than requiring the encoder to compress all the information into a single vector and transferring it to the decoder.