INDEX
Explanations
interrogative phrases and questions
New Auto-Interp
Negative Logits
rique
-0.17
βα
-0.15
Sunder
-0.15
agos
-0.15
iore
-0.14
паÑĢа
-0.14
tek
-0.14
aines
-0.14
king
-0.14
nez
-0.14
POSITIVE LOGITS
other
0.17
idor
0.17
arters
0.16
algun
0.15
any
0.15
_aff
0.15
iment
0.14
çļĨ
0.14
something
0.14
ellite
0.14
Activations Density 0.072%