INDEX
Explanations
names of universities and colleges
references to various universities and colleges
New Auto-Interp
Negative Logits
usting
-0.83
ttle
-0.74
eto
-0.69
._
-0.68
erity
-0.66
retty
-0.64
vous
-0.63
.—
-0.62
?:
-0.61
ura
-0.60
POSITIVE LOGITS
lith
0.66
differential
0.63
FI
0.63
delinqu
0.61
cellul
0.59
Cyborg
0.59
Neph
0.58
hydra
0.58
Lith
0.58
phan
0.57
Activations Density 0.271%