INDEX
Explanations
assertions and statements about the existence or state of objects and events
New Auto-Interp
Negative Logits
acr
-0.15
_SIG
-0.14
ÃŃst
-0.13
trusted
-0.13
bunları
-0.13
beden
-0.13
óz
-0.12
icular
-0.12
argin
-0.12
essaging
-0.12
POSITIVE LOGITS
ours
0.22
mine
0.21
mine
0.20
what
0.19
from
0.19
done
0.19
happening
0.19
theirs
0.19
on
0.19
why
0.18
Activations Density 1.296%