INDEX
Explanations
phrases containing a mix of special characters and letters in a specific pattern
special characters or sequences of non-standard textual elements
New Auto-Interp
Negative Logits
bonded
-0.83
cius
-0.82
zanne
-0.76
bour
-0.76
dominated
-0.76
iche
-0.76
hoax
-0.75
indicted
-0.75
rigged
-0.69
presidency
-0.68
POSITIVE LOGITS
ãĤĭ
1.91
ãģĦ
1.88
ãģ
1.88
ãģ¾
1.78
ãĤ
1.78
ãĤī
1.77
ãģª
1.76
ãģĹ
1.75
ãģŁ
1.72
ãģĵ
1.68
Activations Density 0.014%