INDEX
Explanations
dialogue and quotations within the text
New Auto-Interp
Negative Logits
ads
-0.15
803
-0.14
iat
-0.14
çIJĨ
-0.14
bing
-0.14
ardin
-0.14
æŁĦ
-0.14
atto
-0.13
126
-0.13
illing
-0.13
POSITIVE LOGITS
sWith
0.18
oline
0.17
edo
0.15
ervas
0.15
eel
0.15
sd
0.15
ediÄŁi
0.14
ollider
0.14
obsolete
0.14
edral
0.14
Activations Density 0.020%