INDEX
Explanations
references to the pronoun "it" and similar indicators
New Auto-Interp
Negative Logits
to
-0.63
“
-0.59
up
-0.59
in
-0.56
more
-0.55
ступил
-0.54
far
-0.53
-0.53
impress
-0.52
abos
-0.51
POSITIVE LOGITS
purpoſe
0.84
nakalista
0.83
Majefty
0.83
fubject
0.81
Houſe
0.80
pleaf
0.80
Obrázky
0.79
httphttps
0.79
"..\..\
0.78
Darauf
0.77
Activations Density 0.047%