INDEX
Explanations
references to educational institutions, specifically universities
New Auto-Interp
Negative Logits
Universities
-0.18
out
-0.17
ington
-0.16
usual
-0.16
alin
-0.16
/use
-0.16
rou
-0.15
usc
-0.15
universities
-0.15
User
-0.15
POSITIVE LOGITS
-level
0.21
-wide
0.20
/un
0.20
town
0.19
ois
0.19
(Un
0.19
Press
0.19
å®Ļ
0.18
wide
0.18
presses
0.18
Activations Density 0.046%