INDEX
Explanations
words related to borrowing or names including "Lor"
repeated mentions of specific names and references to boredom
New Auto-Interp
Negative Logits
ted
-0.88
eers
-0.78
ulate
-0.77
ilus
-0.77
hered
-0.76
paio
-0.75
cius
-0.75
ulates
-0.73
erate
-0.72
ichick
-0.71
POSITIVE LOGITS
hood
0.79
ussia
0.73
iculture
0.70
atively
0.69
atives
0.68
OUGH
0.68
UST
0.68
asta
0.68
ingo
0.67
Adin
0.64
Activations Density 0.090%