INDEX
Explanations
words that suggest binaries or oppositional concepts
New Auto-Interp
Negative Logits
imb
-0.15
aml
-0.15
arine
-0.15
ond
-0.15
uel
-0.15
late
-0.15
Im
-0.14
urma
-0.14
ylum
-0.14
eer
-0.14
POSITIVE LOGITS
aku
0.17
å¢
0.16
zier
0.14
afone
0.13
awan
0.13
757
0.13
RequestMethod
0.13
effectively
0.13
ê
0.13
ाà¤ı
0.13
Activations Density 0.039%