INDEX
Explanations
proper nouns related to Stanford University
New Auto-Interp
Negative Logits
itte
-1.23
uably
-0.92
nitrogen
-0.87
ichick
-0.84
ombo
-0.82
ersion
-0.81
=/
-0.81
ution
-0.77
Colossus
-0.77
container
-0.76
POSITIVE LOGITS
izo
1.09
nee
1.00
ews
0.99
chal
0.98
isl
0.96
jong
0.96
ning
0.95
Stan
0.93
vol
0.90
boarding
0.89
Activations Density 0.461%