INDEX
Explanations
negations or expressions of denial in various contexts
New Auto-Interp
Negative Logits
til
-0.15
WS
-0.14
acid
-0.14
Inactive
-0.14
erts
-0.14
prim
-0.14
ady
-0.14
ëĭī
-0.14
Til
-0.14
arp
-0.13
POSITIVE LOGITS
necessarily
0.23
innamon
0.16
matter
0.16
ecessarily
0.16
बर
0.15
á»ĵn
0.15
olet
0.14
доз
0.14
Uvs
0.14
ISCO
0.14
Activations Density 0.096%