INDEX
Explanations
statements related to public figures or public discourse
New Auto-Interp
Negative Logits
+#+#
-1.05
myſelf
-0.94
principalColumn
-0.94
ſeveral
-0.89
AutoScaleMode
-0.88
Monfieur
-0.86
Diweddarwch
-0.86
ſmall
-0.86
esternos
-0.85
GEBURTS
-0.84
POSITIVE LOGITS
$
0.40
いわ
0.39
dimenti
0.39
last
0.38
->
0.37
explained
0.36
مئ
0.36
note
0.36
ev
0.35
…
0.35
Activations Density 0.142%