INDEX
Explanations
mentions of the Wharton School
New Auto-Interp
Negative Logits
outu
-0.18
phin
-0.16
Lawson
-0.15
.FLAG
-0.15
iot
-0.14
atoire
-0.14
urge
-0.14
eteria
-0.14
eg
-0.14
Zaman
-0.14
POSITIVE LOGITS
ouser
0.14
ainment
0.14
jid
0.14
opus
0.14
deg
0.14
anzi
0.14
asca
0.14
ibal
0.14
lar
0.14
adius
0.14
Activations Density 0.001%