INDEX
Explanations
references to the name "Mad" and its variations
New Auto-Interp
Negative Logits
auf
-0.19
yük
-0.17
aat
-0.17
erce
-0.17
aeda
-0.17
sı
-0.16
ingt
-0.16
šek
-0.15
orde
-0.15
INGTON
-0.15
POSITIVE LOGITS
agascar
0.34
ison
0.34
eline
0.33
onna
0.31
ras
0.30
cap
0.29
rig
0.29
emo
0.27
dest
0.27
ame
0.26
Activations Density 0.015%