INDEX
Explanations
references to universities and educational institutions
New Auto-Interp
Negative Logits
irt
-0.17
anou
-0.17
akh
-0.15
uy
-0.15
çķ¥
-0.15
vest
-0.15
ONO
-0.15
bsp
-0.14
inou
-0.14
ifact
-0.14
POSITIVE LOGITS
Notre
0.20
Southern
0.18
Conce
0.17
Judaism
0.16
Phoenix
0.16
Evans
0.16
Arizona
0.16
Mary
0.16
0.16
Pret
0.16
Activations Density 0.015%