INDEX
Explanations
mentions of academic institutions, particularly Yale University
mentions of Yale University
New Auto-Interp
Negative Logits
words
-0.73
leased
-0.71
gotten
-0.69
ossible
-0.68
former
-0.67
achi
-0.66
held
-0.66
drawn
-0.66
ulatory
-0.65
asta
-0.65
POSITIVE LOGITS
Yale
1.01
uates
0.99
GOODMAN
0.92
University
0.86
Haram
0.85
©¶æ¥µ
0.79
Cornell
0.78
bilt
0.77
undergrad
0.77
alumni
0.77
Activations Density 0.002%