INDEX
Explanations
references to Bill and Hillary Clinton
New Auto-Interp
Negative Logits
BS
-0.15
ward
-0.15
Patri
-0.15
occo
-0.14
Wed
-0.14
ầm
-0.14
appa
-0.14
mino
-0.13
weis
-0.13
udent
-0.13
POSITIVE LOGITS
istrovstvÃŃ
0.17
legg
0.15
Ruf
0.15
rell
0.15
ilde
0.15
abad
0.14
ÑıÑĤи
0.14
jerne
0.14
.ov
0.14
ersh
0.14
Activations Density 0.004%