INDEX
Explanations
references to "Stanford" University
mentions of "Stanford"
New Auto-Interp
Negative Logits
alez
-0.77
unal
-0.75
cedented
-0.75
usher
-0.72
merce
-0.69
hops
-0.67
substant
-0.66
unct
-0.66
olitan
-0.66
ramid
-0.66
POSITIVE LOGITS
Stanford
0.98
Institution
0.96
University
0.89
Cardinal
0.85
thal
0.84
sburg
0.80
Alto
0.77
Haram
0.76
Olson
0.74
Hills
0.73
Activations Density 0.009%