INDEX
Explanations
negations and phrases suggesting a lack of agreement or compliance
New Auto-Interp
Negative Logits
anas
-0.16
ats
-0.16
ony
-0.15
izon
-0.15
allee
-0.14
ÙĪØ§Ø±
-0.14
Leh
-0.14
<tag
-0.14
TLS
-0.13
ascade
-0.13
POSITIVE LOGITS
merely
0.19
.only
0.19
mere
0.18
mere
0.17
hanya
0.16
à¥ĩवल
0.16
åıªæĺ¯
0.15
Statics
0.15
ÑĢоп
0.14
only
0.14
Activations Density 0.241%