INDEX
Explanations
proper nouns related to educational institutions, particularly Harvard University
references to Harvard
New Auto-Interp
Negative Logits
mble
-0.84
anwhile
-0.82
uary
-0.79
nesota
-0.78
eers
-0.77
xual
-0.76
transitional
-0.76
ictionary
-0.76
atoon
-0.73
unct
-0.72
POSITIVE LOGITS
Har
1.30
riet
1.14
vard
1.02
vey
0.90
rod
0.89
emin
0.84
rington
0.81
vest
0.79
baugh
0.76
ihar
0.76
Activations Density 0.005%