INDEX
Explanations
mentions of the word "Stanford."
references to Stanford University
New Auto-Interp
Negative Logits
alez
-0.76
hops
-0.71
cedented
-0.70
unct
-0.68
ramid
-0.67
merce
-0.67
substant
-0.66
usher
-0.65
interpret
-0.64
odic
-0.63
POSITIVE LOGITS
Institution
1.06
University
0.99
Stanford
0.89
Cardinal
0.88
Hills
0.84
thal
0.82
Alto
0.79
Marriott
0.79
Researchers
0.77
Hanson
0.77
Activations Density 0.008%