INDEX
Explanations
the name "Ko" followed by a number
New Auto-Interp
Negative Logits
senal
-0.80
Creed
-0.77
glass
-0.74
ب
-0.73
narrator
-0.73
à¨
-0.73
Ö¼
-0.71
天
-0.69
IBLE
-0.68
ingham
-0.66
POSITIVE LOGITS
zzi
1.18
osta
1.10
zy
0.98
essler
0.96
jo
0.96
etter
0.96
unin
0.95
eln
0.95
ppa
0.95
pps
0.94
Activations Density 0.015%