INDEX
Explanations
references to politics and social issues
New Auto-Interp
Negative Logits
zap
-0.18
suc
-0.17
itler
-0.16
chen
-0.15
andro
-0.15
ester
-0.14
/http
-0.14
holm
-0.14
нка
-0.14
bed
-0.14
POSITIVE LOGITS
ugi
0.17
æį
0.16
ãĥ³ãĥIJ
0.15
-prepend
0.15
880
0.14
ĶåĽŀ
0.14
ÑģÑĤи
0.14
ëª
0.14
ikon
0.14
Hag
0.14
Activations Density 0.055%