INDEX
Explanations
negations and comparative phrases
New Auto-Interp
Negative Logits
onium
-0.14
anson
-0.12
plements
-0.12
-нибÑĥдÑĮ
-0.12
íģ¼
-0.12
itÄĽ
-0.12
Broad
-0.12
ÑģÑĤаÑĢа
-0.12
yped
-0.11
_phys
-0.11
POSITIVE LOGITS
only
0.85
ONLY
0.74
only
0.74
solely
0.72
Only
0.66
Only
0.65
_only
0.64
ONLY
0.61
ÑĤолÑĮко
0.60
-only
0.60
Activations Density 0.466%