Chatting with GPT – Neural Networks

Chat GPT link

*NOTE: I have manually modified some of the equations produced by ChatGPT, such as adding a t-1 subscript

Hello, please tell me what is ChatGPT?

ChatGPT is a variant of the GPT (Generative Pre-training Transformer) language model that was specifically designed for use in chatbots and conversational AI systems. It is trained on a large dataset of human-human conversations and is able to generate human-like responses to user input. ChatGPT can be used to build chatbots that can hold natural language conversations with users, helping businesses and organizations automate customer service, sales, and other types of interactions.

Like other GPT models, ChatGPT is a transformer-based language model that uses attention mechanisms to process and generate text. It is trained using a combination of supervised learning, in which the model is given labeled examples of conversation, and unsupervised learning, in which the model is trained to predict the next word in a sequence based on the context of the words that come before it. This allows ChatGPT to learn the patterns and structure of human conversation and generate appropriate responses to user input.

I would like to learn more about machine learning, let’s start with the concept of neural networks. Please explain the concept of a neural network.

A neural network is a type of machine learning model that is inspired by the structure and function of the human brain. It is composed of a large number of interconnected “neurons,” which are simple processing units that receive input, transform it, and produce output.

In a neural network, the input data is fed through a series of hidden layers, each of which is made up of a large number of neurons. These hidden layers process the input data using weighted connections between the neurons. The weights of these connections are adjusted during the training process, allowing the neural network to “learn” from the input data.

The output of the neural network is produced by the output layer, which is also made up of a number of neurons. The output layer takes the processed data from the hidden layers and produces the final output of the neural network.

Neural networks can be used for a wide variety of machine learning tasks, including classification, regression, and clustering. They are particularly powerful for tasks that require the processing of large amounts of data or the detection of patterns and relationships in data that are not immediately apparent.

Overall, the concept of a neural network is a key element of machine learning and is used in many different applications and industries.

How does a neural network look in terms of mathematical equations? What is the architecture?

The architecture of a neural network refers to the way that the neurons are organized and connected within the network. A typical neural network consists of an input layer, one or more hidden layers, and an output layer. The input layer receives the raw input data, and the output layer produces the final output of the network. The hidden layers are located between the input and output layers and are responsible for processing the input data and producing intermediate output that is passed on to the output layer.

In terms of mathematical equations, the computation performed by a neural network can be represented by a series of matrix multiplications and element-wise non-linear functions (also called activation functions).

For example, consider a neural network with one hidden layer and an input layer with m inputs and an output layer with n outputs. Let’s say that the input data is represented by a matrix X with dimensions m x N (where N is the number of data points) and the weights between the input layer and the hidden layer are represented by a matrix W1 with dimensions m x p (where p is the number of neurons in the hidden layer). The output of the hidden layer is then given by:

h = f(XW1 + b1)

where f is the activation function and b1 is a bias term.

The weights between the hidden layer and the output layer are represented by a matrix W2 with dimensions p x n, and the final output of the neural network is given by:

y = f(hW2 + b2)

where b2 is a bias term.

This is just a simple example, but it illustrates the basic idea of how a neural network can be represented mathematically. In practice, neural networks can have many more layers and a much larger number of neurons, and the equations can become quite complex. However, the basic idea of using matrix multiplications and non-linear activation functions to transform the input data and produce output remains the same.

What exactly do you mean by activation function? What are some examples of such functions?

An activation function is a mathematical function that is applied to the output of a neuron in a neural network. It determines whether the neuron will “fire” and produce an output, or whether it will remain inactive. Activation functions are a key component of neural networks because they allow the network to learn non-linear relationships in the data.

There are many different activation functions that can be used in neural networks, each with its own properties and characteristics. Some common examples include:

Sigmoid: The sigmoid function is a smooth, S-shaped curve that maps any input value to a value between 0 and 1. It is often used in the output layer of a binary classification neural network because it can produce a probability-like output.

Tanh: The tanh (hyperbolic tangent) function is also a smooth, S-shaped curve, but it maps input values to a range of -1 to 1. It is similar to the sigmoid function, but it allows for negative output values as well.

ReLU (Rectified Linear Unit): The ReLU function is a piecewise linear function that maps negative input values to 0 and positive input values to the same value. It is a simple and efficient activation function that has been widely adopted in many neural network architectures.

Leaky ReLU: The leaky ReLU function is a variant of the ReLU function that allows a small negative slope for negative input values. This can help prevent the “dying ReLU” problem, in which some neurons in the network become inactive and stop producing output.

Softmax: The softmax function is a generalization of the sigmoid function that is often used in the output layer of a neural network for multi-class classification. It maps the input values to a probability distribution over the different classes, such that the sum of the probabilities is 1.

These are just a few examples of the many activation functions that are used in neural networks. The choice of activation function can have a significant impact on the performance of the network, and different activation functions may be better suited for different types of tasks and data.

So how would I choose an activation function then? Would each layer have a different activation function?

The choice of activation function can have a significant impact on the performance of a neural network, and it is an important design decision that can affect the network’s ability to learn from the data and generalize to new examples.

There is no one-size-fits-all activation function that is best for all situations, and it is often necessary to experiment with different activation functions to find the one that works best for a particular task. In general, it can be helpful to consider the properties of the activation function and how they align with the characteristics of the data and the goals of the task.

For example, if you are building a neural network for a binary classification task, you may want to use a sigmoid or a tanh activation function in the output layer, since these functions can produce a probability-like output between 0 and 1. On the other hand, if you are building a neural network for a multi-class classification task, you may want to use a softmax activation function in the output layer, since it can produce a probability distribution over the different classes.

As for the hidden layers of the network, ReLU and its variants (such as Leaky ReLU) are often a good choice, since they are simple and efficient and have been shown to work well in a wide range of tasks. However, other activation functions may also be suitable, depending on the characteristics of the data and the specific goals of the task.

In general, it is possible to use different activation functions in different layers of a neural network, and this can sometimes be helpful to improve the network’s performance. However, it is also important to be mindful of the complexity of the network and to avoid adding unnecessary complexity that could harm the network’s ability to learn effectively.

So once I have my data inputs, my data outputs, and I choose my activation functions, how do I choose the number of layers in my neural network?

The number of layers in a neural network, as well as the number of neurons in each layer, are important design choices that can impact the network’s ability to learn from the data and generalize to new examples. In general, the more layers and neurons a network has, the more capacity it has to learn from the data. However, too many layers and neurons can also make the network more prone to overfitting, which is when the network performs well on the training data but poorly on new, unseen examples.

There is no hard and fast rule for choosing the number of layers and neurons in a neural network, and it is often necessary to experiment with different configurations to find the one that works best for a particular task. However, there are a few guidelines that you can follow to help you make informed decisions:

  • Start with a small number of layers and neurons and gradually increase them until you see improvements in the network’s performance. This can help you avoid the problem of overfitting, since a smaller network is less likely to overfit the data.
  • Consider the complexity of the task and the amount of data you have available. For more complex tasks or larger datasets, you may need a deeper network (i.e., one with more layers) to learn effectively. On the other hand, for simpler tasks or smaller datasets, a shallower network may be sufficient.
  • Experiment with different architectures and compare the performance of the different models. This can help you identify the best-performing configuration for your task.

Ultimately, the choice of the number of layers and neurons in a neural network is a trade-off between model capacity and generalization. You want to choose a configuration that has enough capacity to learn from the data, but not so much that it overfits the data. Finding the right balance can require some experimentation and trial and error, but with the right approach, you can build a neural network that performs well on your task.

Once I will choose the number of layers, the number of neurons in each layers, the activation functions, how do I train the model then? How will the weights be calculated?

To train a neural network, you need to adjust the weights of the connections between the neurons in such a way that the output of the network is as close as possible to the desired output for a given set of inputs. The process of adjusting the weights to minimize the difference between the predicted output and the desired output is known as optimization.

There are several optimization algorithms that can be used to train a neural network, each with its own pros and cons. Some common optimization algorithms include:

  • Stochastic Gradient Descent (SGD): SGD is a simple and widely-used optimization algorithm that involves updating the weights of the network based on the gradient of the loss function with respect to the weights. The loss function is a measure of how well the network is doing at predicting the desired output given the input data, and the gradient is a vector that points in the direction of the steepest increase in the loss function. SGD involves repeatedly updating the weights in the opposite direction of the gradient, in an attempt to minimize the loss function and improve the performance of the network.
  • Adam (Adaptive Moment Estimation): Adam is a variant of SGD that uses an adaptive learning rate, which means that the step size for updating the weights is adjusted based on the past gradient information. Adam tends to be more efficient and require fewer hyperparameter tuning compared to SGD.
  • Mini-batch gradient descent: Mini-batch gradient descent is a variant of SGD that involves dividing the training data into small batches and updating the weights based on the average of the gradients of the loss function for each batch. This can be more efficient than SGD, especially for large datasets, since it allows the network to learn from the data more quickly.
  • RProp (Resilient Backpropagation): RProp is a variant of SGD that uses a heuristic approach to adjust the learning rate for each weight individually, based on the past gradient information. It is generally faster and more efficient than SGD, and it is often used in large-scale neural network training.

Please explain stochastic gradient descent further, include equations

Stochastic Gradient Descent (SGD) is a widely-used optimization algorithm for training neural networks and other machine learning models. It involves updating the weights of the model based on the gradient of the loss function with respect to the weights. The loss function is a measure of how well the model is doing at predicting the desired output given the input data, and the gradient is a vector that points in the direction of the steepest increase in the loss function.

SGD works by iteratively updating the weights of the model in the opposite direction of the gradient, in an attempt to minimize the loss function and improve the performance of the model. The update rule for SGD can be expressed as follows:

w = w – alpha * gradient

where w is the weight of the model, alpha is the learning rate (a hyperparameter that controls the step size for updating the weights), and gradient is the gradient of the loss function with respect to the weight.

The learning rate is an important hyperparameter in SGD, and it determines the step size for updating the weights. A larger learning rate can lead to faster convergence, but it can also make the optimization process more unstable. On the other hand, a smaller learning rate can lead to slower convergence, but it can also make the optimization process more stable.

SGD can be implemented in a number of ways, and it is often used in conjunction with other techniques, such as mini-batch gradient descent or momentum, to improve the efficiency and stability of the optimization process.

You also mentioned ADAM – Adaptive Moment Estimation, how does this algorithm differ from SGD? Please provide equations.

Adam (Adaptive Moment Estimation) is a variant of Stochastic Gradient Descent (SGD) that uses an adaptive learning rate, which means that the step size for updating the weights is adjusted based on the past gradient information. Adam is a popular optimization algorithm that is often used to train neural networks and other machine learning models.

One of the key differences between Adam and SGD is that Adam uses an exponentially weighted average of the past gradients to scale the learning rate, rather than a fixed learning rate as in SGD. This allows Adam to automatically adjust the learning rate based on the characteristics of the data and the optimization process, which can lead to faster convergence and better performance.

The update rule for Adam can be expressed as follows:

m = beta1 * m_t-1 + (1 – beta1) * gradient

v = beta2 * v_t-1 + (1 – beta2) * gradient^2

w = w_t-1 – alpha * m / (sqrt(v) + epsilon)

where w is the weight of the model, alpha is the learning rate (a hyperparameter that controls the step size for updating the weights), m and v are the exponentially weighted averages of the past gradients and squared gradients, respectively, beta1 and beta2 are hyperparameters that control the decay rate of the moving averages, and epsilon is a small value that is used to prevent division by zero.

Adam uses two moving averages, one for the gradients and one for the squared gradients, which allows it to incorporate both the mean and the variance of the past gradients into the learning rate. This can help to stabilize the optimization process and prevent oscillations or divergences.

You also listed RProp (Resilient Backpropagation) as an optimization algorithms, please explain RProp with equations.

Resilient backpropagation (RProp) is an optimization algorithm used in artificial neural networks for training supervised learning models. It is an extension of the backpropagation algorithm that aims to improve the convergence speed of the training process by adapting the learning rate for each weight in the network during training.

Like backpropagation, RProp uses the gradient of the loss function to update the weights of the network. The gradient is calculated using the chain rule and the weights are updated using the following equation:

weight(t+1) = weight(t) – learning_rate * gradient(t)

In backpropagation, the learning rate is a fixed hyperparameter that is set before training. In RProp, the learning rate is not fixed and is instead adapted for each weight based on the sign of the gradient. In RProp, the learning rate for each weight in the network is updated individually, based on the sign of the gradient of the loss function with respect to that weight. This means that the learning rate for each weight is updated individually based on the convergence of that weight, rather than being updated based on the convergence of all weights in the network. This can improve the convergence speed of the training process, as the learning rate can be adjusted according to the convergence of each weight.

The learning rate is updated using the following equation:

learning_rate(i)_t+1 = learning_rate(i)_t – step_size(i)_t-1 * sng (dE_t-1 / dw(i)_t-1 )

I have also often heard the AdaGrad optimizer mentioned, please explain that to me

In AdaGrad, the learning rate is not fixed and is instead adapted for each weight based on the history of the gradients of the loss function with respect to that weight. Specifically, the learning rate is adjusted based on the sum of the squares of the gradients of the loss function with respect to the weight.

The learning rate for each weight is updated using the following equation:

learning_rate(t+1) = learning_rate(t) / sqrt(sum(gradient(t)^2))

where sum(gradient(t)^2) is the sum of the squares of the gradients of the loss function with respect to the weight up to time t.

This means that the learning rate for each weight is inversely proportional to the sum of the squares of the gradients of the loss function with respect to that weight. This has the effect of decreasing the learning rate for weights that have had large gradients in the past, and increasing the learning rate for weights that have had small gradients in the past.

I have also heard of gradient boosting, explain that

Gradient boosting is a machine learning technique for building predictive models by creating an ensemble of weak learners and combining their predictions to create a stronger, more accurate model. The general idea of gradient boosting is to train a sequence of weak learners (e.g. decision trees) in a stage-wise manner, with each subsequent learner trying to correct the errors made by the previous learner. At each stage, the gradient boosting algorithm fits a new weak learner to the residual errors made by the previous learner. The residual errors are the difference between the true values and the predicted values of the target variable.

In gradient boosting, weights are assigned to the incorrectly classified samples in order to give more importance to the samples that are difficult to classify. This helps the weak learners in the ensemble to focus more on the samples that are most likely to improve the overall performance of the model.

The steps of the gradient boosting algorithm can be summarized in the following equations:

  1. Initialize the ensemble with a single weak learner, such as a decision tree, and make predictions using this learner:

prediction(1) = weak_learner_prediction(1)

  1. Calculate the residual errors between the true values and the predicted values of the target variable:

residual_error(t) = true_value(t) – prediction(t)

  1. Train a new weak learner to predict the residual errors from the previous step:

weak_learner_prediction(t+1) = train(weak_learner, residual_error(t))

  1. Update the predictions of the ensemble by adding the predictions of the new weak learner to the previous predictions:

prediction(t+1) = prediction(t) + weak_learner_prediction(t+1)

  • Repeat steps 2-4 until the desired number of weak learners has been trained.

What is AdaBoost and what are the steps of the algorithm? Include equations.

AdaBoost works by weighting the observations in the training set and adjusting the weight of each observation at each iteration of the algorithm based on the performance of the previous models. The idea behind AdaBoost is to train a sequence of weak learners in an iterative manner, with each learner trying to correct the mistakes of the previous learner. The final model is a weighted average of all the weak models.

Here are the steps of the AdaBoost algorithm:

  1. Initialize the weights of the observations in the training set:
    • At the beginning, all the observations in the training set are given the same weight.
  2. Train a weak learner:
    • Train a weak learner (e.g., a decision tree) on the training set using the current weights of the observations (My note: sample with replacement).
  3. Calculate the error of the weak learner:
    • Calculate the error (my note: total error) on the training set using the following equation:error = sum(weights[i] * I[prediction[i] != y[i]])
    where I is the indicator function that returns 1 if the prediction is incorrect and 0 if the prediction is correct, weights is the array of weights for the observations, prediction is the array of predictions made by the weak learner, and y is the array of true labels for the observations.
  4. Calculate the weight of the weak learner:
    • Calculate the weight of the weak learner using the following equation :
      alpha = 0.5 * log((1 – total error) / total error)
  5. Adjust the weights of the observations:
    • Adjust the weights of the observations using the following equation:
      weights[i+1] = weights[i] * exp(-alpha * I[prediction[i] == y[i]])
  6. Normalize the weights:
    • Normalize the weights so that they sum up to 1.
  7. Repeat steps 2-6:
    • Repeat steps 2-6 a predetermined number of times or until the error of the model becomes small enough.
  8. Return the final model:
    • Return the final model as a weighted average of all the weak models.

My note:

Short note on exp(-alpha * I[prediction[i] == y[i]]) i.e. for misclassification

When the performance is relatively large the last stump did a good job in classifying the records now the new sample weight will be much larger than the old one. When the performance is relatively low the last stump did not do a good job in classifying the records now the new sample weight will only be little larger than the old one.

No misclassification: When the performance is relatively large the last stump did a good job in classifying the records now the new sample weight will be very small than the old one. When the performance is relatively small the last stump did not do a good job in classifying the records now the new sample weight will only be little smaller than the old one.

What is the difference between gradient boosting algorithm and adaptive boosting?

Gradient boosting and adaptive boosting are both ensemble learning algorithms that can be used to improve the performance of a weak learner (a model that performs only slightly better than random guessing) by combining the predictions of multiple weak learners.

The main difference between gradient boosting and adaptive boosting is the way in which the weak learners are trained and combined.

In gradient boosting, the weak learners are trained sequentially, with each learner trying to correct the mistakes of the previous learner. In adaptive boosting (also known as AdaBoost), the weak learners are also trained sequentially, but the training process is different. Each weak learner is trained on a different subset of the training data, and the weights of the training examples are adjusted after each learner is trained to give more weight to the examples that were misclassified. The final prediction is made by combining the predictions of all of the learners using a weighted sum, where the weight of each learner is determined by its accuracy on the training data.

Advent of Code Day 5 – my bonus question

I am doing the Advent of Code. So far I have solved all the questions for the four previous days and part one of the question for day five. I have also created my own question for fun, the question is below:

After many hours of walking, the Elves come to a forest glade. They are quite tired and hungry, one of the elves suddenly notices that the glade is full of mushrooms. The Elves are familiar with this mushrooms species – they are edible and quite tasty. The Elves pick all of the mushrooms and are almost ready to make mushroom soup, when they remember about one tricky problem – there is a poisonous mushroom species that looks very similar and often a poisonous mushroom will grow right among the edible mushrooms.

At this point the elves have determined the molecular structure of each mushrooms that they picked. The structure always consists of five segments and each segment consists of a number and a letter.

Example: 0.9H 0.08G 0.27L 0.57M 0.84P

Each letter molecule (A – Z) has a corresponding weight, from 0 to 25. The numbers also represent additional weight units. It is therefore possible to calculate the molecular weight of each mushroom. In the above example the weight would be 0.9 + 7 + 0.08 + 6 + 0.27 + 11 + 0.57 + 12 + 0.84 + 15 = 53.66

If the structure had a negative number, such as if it would be 0.9H -0.08G 0.27L 0.57M 0.84P, then the negative segment would need to be subtracted. The weight then would be 0.9 + 7 – 0.08 – 6 + 0.27 + 11 + 0.57 + 12 + 0.84 + 15 = 41.5

The Elves are aware that the value of each segment of a mushroom comes from a process generated by ~N(12.5, 4.5) and there is no correlation between the segments. (The value of the segment is number + letter, for example 0.9H is 7.9, while -0.08G is -6.08).

The mushroom that is poisonous is definitely tricky to find for the Elves because it looks exactly the same as the edible mushrooms. BUT! The molecular structure of this mushroom gives it away! It is very unlikely that such structure would be generated by the same process as for the edible mushrooms. Find the poisonous mushroom from the input list so that the Elves can start cooking their soup.

The list of mushrooms is in the link below:

Advent of Code Day 5 bonus question input

Postpartum psychosis stories

I think it’s very important to talk about this condition that not everyone is aware of – postpartum psychosis. Postpartum psychosis is when psychosis occurs sometime after a woman gives birth, it’s a separate condition from postpartum depression. Below I have posted links to several stories of postpartum psychosis. One important message stated was that a woman would likely not be aware that she is suffering from psychosis. The delusions/hallucinations that occur during psychosis feel very real to the patient. If a woman had never experienced psychosis before and maybe even was not aware of existence of postpartum psychosis, it would be quite difficult for her to understand in that state that what she is experiencing is not real and is a serious medical condition.

I think it’s very useful for everyone to hear these stories below and to be aware what are the symptoms. During psychosis the person can have auditory hallucinations, hear voices, they can also feel that objects/people are sending them messages. In one of the stories a woman described how during her psychotic episode she heard voices coming from the radiator and thought that different colours were sending her messages. Insomnia can also be a symptom of psychosis and can exacerbate the lack of sleep caused by having a new born, and lack of sleep in turn can exacerbate psychotic symptoms. Another woman tells her story of how she started to see her mom, who actually passed away. She also hallucinated a man in her house who she thought was planning to kill her. Manic symptoms can also occur such as believing that there is a divine intervention and you are now being able to know things that others can’t, believing that you can achieve anything such as learning a language overnight, as well as rapid speech and sleeping only a few hours a day.

Postpartum psychosis – Katy’s story

Postpartum psychosis – story #2

Postpartum psychosis – story #3

Off-topic, using NLP to summarize Russian propaganda articles on the invasion of Ukraine

I have a pretty simple article summarizer in python that I use for my blog. It’s not machine learning, it’s pretty basic – collecting N articles based on a search query, cleaning up the text, tokenizing, removing stop words. Then word frequencies are calculated and each sentence is scored based on those frequencies. Top N sentences are selected for the summary.

I usually use it to summarize articles on some specific supplements, but today for fun I decided to try a different search query. My query was “путин россия украина спецоперация”. I think that even though this summarizer is pretty simple, the summary turned out to be pretty meaningful. It gives a pretty good overview of what sort of info you would get if you were reading Russian news. I selected the top 15 sentences for the summary.

Что касается слов президента о спецоперации на Донбассе, то они показывают, что наши военнослужащие снова спасают мир от коричневой чумы нацизма, и все поставленные задачи, вне всякого сомнения, будут выполнены. Верховный главнокомандующий еще раз подтвердил, спецоперация на Украине будет доведена до своего логического завершения по тем задачам, которые он озвучил 24 февраля, в день начала СВО», — сказал полковник. И президент в своей речи указал на союзников из Европы, Азии, Африки и Латинской Америки, которые не прогибаются перед так называемым гегемоном, выбирают суверенный путь развития и хотят коллективно решать вопросы безопасности и сформировать многополярный мир.

Украинские издания утверждают, что СНБО одобрил санкции против патриарха Кирилла, а также Сергея Кириенко, Евгения Пригожина, Романа Абрамовича, Олега Дерипаски, Михаила Фридмана, Виктора Медведчука, а также Виктора Януковича. Наш долг перед памятью миллионов жертв Второй мировой войны — жестко реагировать на попытки фальсификации истории, противодействовать распространению любых форм неонацизма, русофобии и расизма», — призвал президент. Все указывает на то, что временной, человеческий, материальный, дипломатический ресурс «спецоперации» близится к исчерпанию, и Путин делает решительный шаг, чтобы закончить как можно скорее, зафиксировав прибыли и убытки. “Наша задача, наша миссия, солдат, ополченцев Донбасса – эту войну прекратить, защитить людей и, конечно, защитить саму Россию”, – подчеркнул Путин.

Выступление Путина демонстрирует готовность Вооруженных сил России довести специальную военную операцию на Украине до победного конца, пояснил URA.RU военный эксперт, ветеран разведки, полковник Анатолий Матвийчук. Война, развязанная Западом и киевской хунтой, будет закончена», — заключил Красов. Российское высокоточное оружие опережает иностранные аналоги на годы и десятилетия, при этом значительно превосходит их по тактико-техническим характеристикам, а оборонная промышленность лежит в основе суверенитета России, отметил президент.

Говоря о шагах со стороны Запада, ранее председатель комиссии Госдумы по расследованию фактов вмешательства иностранных государств во внутренние дела России Василий Пискарев заявлял о том, что зарубежные НКО насаждают радикальные идеологии в российском обществе, сообщает РАПСИ. «Сперва коллективный Запад изощрялся доказать, что якобы „разорвал в клочья“ российскую экономику, теперь там думают, что изолировали нашу страну от всего остального мира. Вооруженные силы России надежно защищают свою страну и несут свободу другим народам, заявил Путин, открывая работу форума «Армия-2022» и Армейских международных игр — 2022. «У России в запасе есть несколько военных вариантов, и для США и их союзников по НАТО эти сценарии — серьезный повод для беспокойства», — пишет журнал. Несмотря на то, что Запад был воодушевлен последними «успехами» ВСУ, генерал США Марк Милли призвал опасаться непредсказуемости России.

Dealing with depression after encephalitis. After many years of trials, this is my current depression regimen, just wanted to share.

Hello everyone, I just wanted to share my current depression regimen and some situation info, in case anyone has similar health issues. I have experienced many hospitalizations since 2015, including involuntary psychiatric hospitalizations. Finally in 2017 I was diagnosed with autoimmune encephalitis (brain inflammation), as well as autoimmune thyroiditis. I was treated with intravenous corticosteroids and that led to some improvement. I continue to experience health issues, but I have made several life style changes that have helped me and that I wanted to share. Again, I was diagnosed with autoimmune disease, and my neuropsychiatrist believes that the encephalitis greatly contributed to my depression. Clearly it’s not the case for everyone, so I am not stating that this should work for all. I have been doing better since these changes, I was able to complete a graduate degree, get back to painting, and started writing and playing guitar again. These were huge improvements for me as I was not able to enjoy any hobbies when I had severe depression and was not able to pursue graduate courses.

  1. I cut out all refined carbs and processed foods. There is sufficient evidence indicating that these foods contribute to inflammation. I am not doing keto or low carb, I am not trying to be very strict with myself, I enjoy all sorts of complex carbs such as baked plantains, potatoes, oatmeal, fruits, berries, etc.
  2. Switched to low glycemic foods – this related to #1, as cutting out refined simple carbs in general does leave one with complex carbs that have lower glycemic index.
  3. Foods that cause an immune reaction – this clearly does not occur for most people, but some do react to certain foods. I noticed that I feel physically and emotionally worse after eating gluten, dairy, or soy, so I had to drop these from my diet.
  4. I go to sleep earlier and stay away from my laptop/phone screen after 9pm. I used to stay up late, but now I go to bed around 11pm. After 9pm I usually dim the lights in the room a bit and I read on my Kindle. Kindle Paperwhite does not emit a high amount of blue light. I also installed blackout curtains so that I spend the night sleeping in the dark.
  5. Sleep is very important – so when I really can’t fall asleep, I do use a cannabis oil (NightNight CBN + CBD oil). But changing my diet, losing weight, and going to bed earlier, did reduce my insomnia, so I don’t need the oil every day.
  6. Significantly decreasing my caffeine intake – personally for me it did lower my anxiety and the occurrence of panic attacks, I now only have green tea in the afternoon, otherwise I drink rooibos tea, water, kefir, decaf tea.
  7. Intermittent fasting – I do fell less brain fog and more clear headed when I am not eating the whole day. I used to surf the internet at 1am eating Sweet & Salty bars. Then my mind would go into dark places and I would start reading about serial killers. Now I eat two to three meals a day between 9am and 5pm, I fast for 16-18 hours a day.
  8. Seeing a psychologist – going through CBT and DBT did help, and this related to #5. I still experience racing thoughts, anxiety, and other issues, but I can now more easily choose to not follow my thoughts. For example – I did used to read a lot about US serial killers and then I would freak myself out and I would start to think that someone could climb through the window. Now I choose more what I read – should I keep reading about mass murders? What is the point of that for me? Will that change anything for the better?
  9. Sunlight – I try to get some sunlight each morning, if I have no energy to come out, I still stick out of the window and get some sunlight on my face.
  10. Exercise – I experience certain pains due to autoimmune disease, and fatigue, so I don’t do extensive exercise, but I do yoga at home. And by exercise I don’t mean that I do a whole hour after work, I do certain yoga poses occasionally throughout the day. I think that’s still better than no exercise.
  11. Shrooms – I did several shroom trips, at home alone, after I was treated for encephalitis. I haven’t done shrooms for a while due to pregnancy and breastfeeding, but the positive antidepressant effects of the trips still remain for me.
  12. CBT, again – accepting that some days are better than others, some are worse, but also seeing the positive – in general I am doing much much better now than in 2016. I am female, hormones fluctuate, I do feel worse during the luteal phase, but I experience a lot more enjoyable moments than before my steroids treatment and this lifestyle change.

Eggshells – a cheap non-dairy source of calcium

If you are going to adhere to a dairy free diet, you are going to need to find another source of calcium. Humans require calcium for muscle contraction, blood clotting, normal heart rhythm, as well as nerve functions. The recommended dietary allowance (RDA) for calcium is around 1000mg per day. The RDA is elevated to 1,300 milligrams per day during adolescence because this is the life stage with accelerated bone growth.

For women above age fifty and men older than seventy-one, the RDAs are also a bit higher for several reasons including that as we age, calcium absorption in the gut decreases, vitamin D3 activation is reduced, and maintaining adequate blood levels of calcium is important to prevent an acceleration of bone tissue loss (especially during menopause). Results of some large trials found that higher calcium intakes (usually achieved with a supplement) was associated with improved bone density and slightly lower risk of hip fractures.

Now in regards to eggshells – most people probably don’t eat them, but eggshells are an excellent source of calcium. Eggshells are also a natural source of other elements such as strontium and fluorine. Approximately half an eggshell from a 42g egg would provide 750 mg of calcium, which is 75% of the daily recommended value of 1000mg. Clinical and experimental studies showed that eggshell powder has positive effects on bone and cartilage and that it is suitable in the prevention and treatment of osteoporosis.

How can you eat an eggshell though?

My process is simple – I often eat boiled eggs for breakfast in the morning. Once I peel the egg, I am left with an eggshell that has already been sanitized by boiling (it’s important to sanitize the eggshells due to the possibility of the Salmonella bacteria being on the shell). I then bake the eggshells at a low temperature of 225F in order to dry them. After they cool down, I simply grind the shells into fine powder using a coffee grinder. The powder can be consumed with a spoon, added to smoothies, oatmeal, dough, etc.

Celiac disease and dairy proteins – summarization of articles

I want to address the issue of whether dairy could be an issue for those with celiac disease. I don’t think that I will be able to arrive at an exact answer with this post, but I do wish to summarize existing articles and evidence on this topic. From my personal experience, I get all the same symptoms from dairy products as from foods containing gluten. The symptoms include pains in the lower abdomen, bloating, constipation, fatigue, inflammation of the eyelids, as well as psychiatric symptoms including panic attacks, anxiety, and depression. Gathering anecdotal evidence by speaking to reddit users in the gluten-free subreddit, multiple individuals have also expressed the same experience with dairy causing similar symptoms to gluten. Also these individuals noticed that the same symptoms were caused by lactose-free products, therefore likely the culprit is not the sugar (lactose), but the proteins in dairy (casein). Below I will summarize several articles addressing the consumption of casein by individuals with celiac disease.

The first study that I found looking at the correlation between gluten and casein is from 2007, Mucosal reactivity to cow’s milk protein in coeliac disease. This article discusses the fact that some celiac patients on a gluten-free diet still experience gastrointestinal symptoms. The authors then examine whether these patients have an inflammatory immune response to the protein in cow’s milk. The results of this study indicated that in fact in a fraction of celiac patients did experience a similar reaction to the milk protein as to gluten. As usual, I used python to create article summaries, including this one.

Summary:
On clinical grounds cow’s milk (CM) protein sensitivity may be suspected. Here, using rectal protein challenge, we investigated the local inflammatory reaction to gluten and CM protein in adult patients with CD in remission.
In 18 of 20 patients gluten challenge induced neutrophil activation defined as increased MPO release and increased NO synthesis.
A mucosal inflammatory response similar to that elicited by gluten was produced by CM protein in about 50% of the patients with coeliac disease.

Summary using LexRank (graph-based method for computing relative importance of sentences):

Mean rectal ΔMPO was 303 ± 27 µg/l after casein challenge and 16 ± 27 µg/l after challenge with α-lactalbumin.
Compared to healthy controls, patients with CD showed significant increases in rectal NO and MPO concentrations measured 15 h after challenge with both CM and gluten (P < 0·001), while ECP was increased to a similar extent in the two groups ( ).
The major finding in this study is that rectal challenge with CM protein frequently induced a local inflammatory mucosal reaction in patients with CD but not in healthy controls.
Our patients with CD had normal serum levels of IgA, IgG and IgE against casein and α-lactalbumin, which might be explained by the fact that they were on a gluten-free diet and therefore had improved the mucosal integrity.
Our finding that, in a fraction of coeliac patients, CM protein challenge may induce an inflammatory reaction of the same magnitude, as did gluten challenge, may also suggest an innate as well as adaptive immune response to CM, and casein in particular.

There were several other studies on the topic of gluten-free and casein-diet, but they all investigated whether this diet would help patients on the autism spectrum, which is not the topic of my post. I did find another short article on gluten-free and casein-free diet helping with psychotic symptoms. Personally I have a similar experience, as consuming any gluten or dairy increases my paranoia, panic attacks, and intrusive thoughts. The authors claim that there is a following mechanism for psychosis:

“In autism and schizophrenia, incomplete digestion of certain proteins, gluten and casein, cause an autoimmune response as indicated by elevated levels of IgA and IgG antibodies. This intestinal malabsorption also causes pathogenic elements (peptide fractions), which bind to opioid receptors by crossing the blood-brain barrier. This releases exorphins (opiate-like substances, similar to certain drugs) that cause psychotic symptoms.”

Evidence-Based Practice: Introduction of a Gluten-Free and Casein-Free Diet to Alleviate Psychotic Symptoms
A case review of a young boy yielded an unexpected resolution of psychotic symptoms after the introduction of a gluten-free, casein-free (GFCF) diet.
The purpose of this paper is to show that health care professionals may use a gluten-free and casein-free diet (GFCF) as an additional element to standard treatment methods, to alleviate psychotic symptoms.
Additionally noted were similarities between autism and schizophrenia.
Introduction of a GFCF diet helps reduce psychotic symptoms, and gives another option for patients resistant to traditional treatment methods, especially adolescents and young adults.
Keywords: autism, gluten-free, casein-free diet (GFCF), psychosis, schizophrenia

My previous mistake when going on a dairy-free diet: too many food restrictions and not enough calcium

I want to describe my mistakes with my previous attempt at going dairy-free. A bit of background – I started experiencing severe abdominal cramps in my 20s, then also I started to have panic attacks, fatigue, and swollen eyelids. I had problems waking up in the morning. I ended up being referred to a psychiatrist, but the medications did not help. Finally an endocrinologist checked my antibodies and found that I had very high levels of thyroid antibodies, so my immune system was attacking and damaging my thyroid. I was put on thyroid medication. I also was referred to a neurologist who then diagnosed me with autoimmune encephalitis (brain inflammation), and I was treated with intravenous steroids (for immunosuppression). At the same time I started reading online a lot about autoimmune diseases and I came across articles about the AIP diet. I was feeling to unwell, so I decided that I had to change my lifestyle, and I started following the AIP diet strictly – no dairy, no gluten, no soy, no grains, no legumes, no nuts, no chocolate, no alcohol. There were a lot of restrictions! You can google this diet, if you are curious.

After the corticosteroid treatment and the diet change, I did start feeling better, I l also lost 20kg, but I still experienced a lot of symptoms such as irritability, leg spasms, feeling of numbness in my fingers, and insomnia. I ended up deciding that there was no scientific evidence for my dietary restrictions, and at some point I went back to eating dairy and gluten, as well as the rest of the foods. I ended up gaining 30kg, and starting to again experiencing paranoia, panic attacks, nightmares, and fatigue.

I recently decided to look into my diet again and instead of going into the extremes – such as the very strict AIP diet, I started with excluding dairy. I also realized that when I was dairy-free the first time, I did not consume any foods with calcium, and that could have been the cause of my muscle cramps and numbness in my hands. This time I looked into non-dairy sources of calcium and calculated how much of those foods I would need to be eating. I have now been dairy free since February, I also went gluten-free and soy-free, as I noticed through multiple observations, that those foods were also causing symptoms for me. I now no longer have any pains in the lower abdomen, I have more energy and was able to attend yoga classes. I have no symptoms of low calcium this time, as I eat canned sardines, canned salmon with bones, and powdered egg shells. I am feeling much better, and I have lost around 22 pounds since February.

Obtaining calcium on a dairy-free diet

I have been on a dairy-free diet since February and for me personally it has been helping with the brain fog issue, scalp eczema, and experiences of fear/panic attacks/existential anxiety. I have tried all sorts of dairy – fermented, non-fermented, lactose-free, goat milk, sheep milk, kefir, yogurt. In the end, I found that I feel better on the days when I do not consume dairy and that the negative effects of it make it not worth it for me to continue consuming it. Some dairy has significant levels of the vitamin B12 and iodine, but those nutrients are not difficult to find elsewhere. Chicken liver, eggs, and salmon contain enough vitamin B12. Eggs also contain iodine and I use iodized salt when cooking. The main element I was concerned about when I stopped eating dairy was calcium. I tried taking calcium supplements for a while but I experienced quite negative side-effects such as frequent urination, constipation, and abdominal bloating. I decided to therefore look for natural sources of calcium that are dairy-free.

It’s possible to obtain enough calcium from fortified plant milks, and that might be the easiest option, if that works for you. I’m not sure though that it would be any different from just taking a calcium supplement, because plant milks are fortified with the same calcium carbonate. I am also not a fan of fortified plant milks as there is some evidence that consuming a lot of extra vitamin B12 and folate, which are added to plant milks in high amounts, can increase the risk of some cancers.

Folic Acid and Vitamin B12 Supplementation and the Risk of Cancer: Long-term Follow-up of the B Vitamins for the Prevention of Osteoporotic Fractures (B-PROOF) Trial

Also I found that fortified plant milks increased my anxiety and occurrence of nightmares, I found the same effects when I was taking a B12 supplement on its own. I consume enough B12 through eggs, meat, and fish, and enough folate through legumes and fruits, therefore my assumption is that the extra vitamins from fortified plant milks had negative effects for me. If you have no problem with vegan substitute products, then you can easily obtain enough calcium without diary by eating fortified vegan plant milks, vegan yogurts, vegan cheese. Tofu also usually has calcium sulfate added to it. Unfortunately I experienced worsening of scalp eczema when eating soy frequently, but if soy is not a problem for you, fortified soy milk and tofu are good sources of calcium.

Since I am currently not consuming a lot of soy products and not consuming plant milks, I had to find other sources of calcium. After looking through nutritional info for various foods, I found that the best way to get enough calcium is to consume bones. Most of the calcium is stored in bones, and only small amounts are found in blood and tissues.
The easiest way to consume bones is by eating canned salmon and canned sardines. You can check the nutrition info on the cans, the salmon needs to be “with skin and bones”, not just the file. The can of salmon that I bought contains in total 363mg of calcium, which is about 30% of daily value (DV). A can of sardines can contain up to 40% of DV of calcium. You can eat the fish right out of the can or add to soup. Another source of calcium is chia seeds, two table spoons contain around 18% of DV. You can add chia seeds to oatmeal in the morning. Another way to eat some bones is by cooking bone broth. There is no evidence that calcium leaks into the actual broth, the amount is very minimal, so you would have to eat the softened bones themselves. I usually cook a broth with chicken or turkey bones for at least an hour, then they are soft enough to actually eat.

Best natural sources of calcium: canned sardines, canned salmon, chicken/turkey bones from bone broth, chia seeds

Why do humans need calcium?

Calcium is required for multiple processes in the human body. 99% of the calcium is stored in bones and only 1% is found in blood and tissues. When there is not enough calcium in the blood, the parathyroid hormone will signal the bones to release calcium. Therefore it’s important to consume enough calcium so that it doesn’t have to keep being released from bones, as you don’t want to weaken your bones. For women especially, it is important to receive enough calcium from diet. In all humans after the age of 30, bone destruction usually exceeds bone products. Women can experience greater bone loss after menopause as the levels of hormones that play a role in building bones become permanently lower.

Calcium is needed for nerves to carry signals between brain and other body parts, it’s also needed for muscles to move. Calcium is required for muscle contraction, blood clotting, regulating heart rate, and cell fluid balance. Not having enough calcium can lead to painful muscle spasms, twitching of muscles, numbness or tingling in feet, hands, and mouth. It can also lead to anxiety, depression, itchy skin, and tiredness.

Simple gluten-free and dairy free breakfast

I have decided to go not only dairy-free, which I have been doing for a while, but also gluten-free again. This has led me to re-planning my meals. When I previously followed a dairy-free and gluten-free diet during 2016-2018, I was able to loose a lot of weight. I can’t be certain that it was specifically the avoidance of gluten and dairy, as I was also taking Cytomel (a synthetic version of the T3 thyroid hormone), and Cytomel is known to possible lead to significant weight loss. I started eating gluten again three years ago, as I was not convinced that there is such a condition as gluten intolerance (without having celiac disease). I am still not convinced that I have gluten intolerance, but being currently quite overweight, and not being able to lose the extra weight, I decided to try the gluten-free diet again. I don’t think that I will miss out on any vitamins by giving up gluten, as my diet is varied enough in order to obtain all the essentials. In any case, I can always return to eating gluten if I will not observe any effects of a gluten-free diet on my weight.

I have been already eating healthy, in my opinion, but now my meals required some re-planning. It’s no longer possible to eat a rye toast with hummus in the morning, or a smoked salmon whole-wheat sandwich for lunch. I also don’t really enjoy cooking and would like this process to be as simple as possible. In the morning I have a small gap of time between the time that the nanny arrives and the time at which I have to start work. Therefore, breakfast preparation has to be especially quick. Below is my idea for a breakfast meal that requires very little cooking or waiting. It also does not contain any refined carbohydrates.

Meal: boiled eggs, a gluten-free tortilla with hummus, oatmeal with coconut milk and berries

Main protein: one or two boiled eggs (depending on how hungry you are)
Start cooking the eggs as the first step, as other items will take a shorter time to prepare.
One egg contains about 7 grams of protein and 1.6 grams of saturated fat, eggs do no contain carbohydrates. One egg has about 0.6 micrograms of B12, which is 25% of the daily value. Therefore by eating two eggs in the morning, I can obtain 50% of my DV of B12! Eggs also contain folate, riboflavin, iodine, and selenium.

Additional protein: hummus with a gluten-free tortilla or toast
Hummus is made out of chickpeas and tahini, chickpeas are relatively high in protein and folate, also B6 and magnesium. I used sweet potato tortillas, which were OK, but any other gluten-free tortillas or bread would do:
https://www.bfreefoods.com/us/products/sweet-potato-wraps/

Starch / carbohydrates: quick steel cut oats
I stumbled upon these oats on Amazon and I found this product pretty useful. The oats can be prepared in a microwave in 2.5 minutes. I mix them with coconut milk power before microwaving.
https://www.amazon.ca/Post-Original-Instant-Oatmeal-Flaxseeds/dp/B08X8HLXSP

Oats are known to contain beta-glucan soluble fiber, which contributes to gut health. Beta-glucan fiber may also prevent sharp increases in blood sugar after meals. Oats are also a source of thiamine, magnesium, zinc, and phosphorous.

Fat: coconut milk powder
Coconut milk contains a type of fat called medium-chain triglycerides (MCTs). MCTs could potentially aid in weight loss and increase insulin sensitivity. Also it is creamy! And since this breakfast idea is dairy free, coconut milk is what makes the oatmeal taste better.

Additional items: add any berries, nuts, dark chocolate chips to your oatmeal
I added strawberries. Strawberries contain high amount of vitamin C and also contain folate and manganese.
Still hungry? Slowly eat a whole celery stick. It is low in calories but can help you feel full. It’s difficult to eat celery quickly, and eating slowly can help pass the time, until your brain finally signals that you are satiated.

Drink: tea or coffee
I do not drink any juices, as most juices have a very high glycemic index.