INDEX
Explanations
mentions or references to Harvard University
references to Harvard University
New Auto-Interp
Negative Logits
odic
-0.79
alez
-0.78
oute
-0.77
ktop
-0.76
afort
-0.72
odcast
-0.71
phabet
-0.71
leased
-0.70
atoon
-0.70
utm
-0.67
POSITIVE LOGITS
Yard
1.01
University
0.96
Crimson
0.91
graduates
0.83
Kennedy
0.81
uates
0.80
Square
0.80
Harvard
0.79
Institution
0.78
undergrad
0.76
Activations Density 0.008%