INDEX
Explanations
references to the Stanford University, particularly emphasizing mentions with high activation values like 'Stanford 10' and 'Stanford 9'
mentions of the term "Stanford."
New Auto-Interp
Negative Logits
alez
-0.82
hops
-0.76
livest
-0.70
ocre
-0.70
alam
-0.69
mble
-0.69
usting
-0.69
bleacher
-0.68
pelled
-0.66
usher
-0.66
POSITIVE LOGITS
Institution
0.84
Cardinal
0.80
University
0.76
Hills
0.75
Prison
0.74
Alto
0.73
Linear
0.72
Laboratories
0.70
Encyclopedia
0.69
Eye
0.69
Activations Density 0.021%