INDEX
Explanations
references to "Stanford University."
mentions of "Stanford."
New Auto-Interp
Negative Logits
alez
-0.78
hops
-0.77
usting
-0.71
alam
-0.71
livest
-0.70
ocre
-0.69
usher
-0.69
pelled
-0.66
ombo
-0.66
mble
-0.65
POSITIVE LOGITS
Institution
0.86
Hills
0.85
Cardinal
0.81
Alto
0.80
University
0.77
thal
0.74
Encyclopedia
0.74
Linear
0.73
Laboratories
0.72
Beach
0.71
Activations Density 0.022%