INDEX
Explanations
prepositions, particularly the word “de” in various contexts
New Auto-Interp
Negative Logits
Anſ
-0.88
raiſ
-0.83
itſelf
-0.79
―――――
-0.75
purpoſe
-0.73
Majefty
-0.72
Asturias
-0.72
Kanna
-0.72
himſelf
-0.71
iſt
-0.71
POSITIVE LOGITS
de
1.18
De
1.07
Σε
0.95
De
0.94
DE
0.92
indd
0.92
the
0.89
員の
0.84
بوابة
0.84
DeV
0.83
Activations Density 0.027%