INDEX
Explanations
mentions of the institution "Stanford"
references to Stanford University
New Auto-Interp
Negative Logits
hops
-0.75
Sov
-0.74
erate
-0.71
bleacher
-0.64
erous
-0.63
alez
-0.63
stuffing
-0.63
usher
-0.62
DonaldTrump
-0.61
qs
-0.61
POSITIVE LOGITS
University
1.06
Institution
1.01
Cardinal
0.89
Encyclopedia
0.86
Graduate
0.84
Prison
0.82
Univ
0.80
Hills
0.79
sburg
0.79
Linear
0.78
Activations Density 0.036%