INDEX
Explanations
instances of speech or quotations in the text
New Auto-Interp
Negative Logits
imony
-0.15
atat
-0.15
conclusion
-0.15
ilver
-0.14
Bik
-0.14
ondo
-0.14
htub
-0.14
ella
-0.14
ahi
-0.14
imon
-0.14
POSITIVE LOGITS
JUnit
0.16
piger
0.15
anale
0.15
canh
0.14
.Invariant
0.14
âĨĴâĨĴ
0.14
bis
0.14
errer
0.13
Ïĥα
0.13
Intelligence
0.13
Activations Density 0.054%