INDEX
Explanations
phrases indicating possession or ownership in relation to questions, suggestions, and actions
New Auto-Interp
Negative Logits
ichert
-0.19
OnInit
-0.16
uelle
-0.16
ообÑĢаз
-0.14
inho
-0.14
olley
-0.14
ustral
-0.14
no
-0.14
/Gate
-0.14
pone
-0.13
POSITIVE LOGITS
jde
0.19
however
0.18
SOME
0.16
bit
0.15
though
0.15
some
0.15
occ
0.14
occasionally
0.14
DTD
0.14
However
0.14
Activations Density 0.083%