INDEX
Explanations
references to "Penn" or related terms in the text
New Auto-Interp
Negative Logits
er
-0.19
dyn
-0.16
naire
-0.15
erah
-0.15
ress
-0.15
oop
-0.14
cz
-0.14
oque
-0.14
boa
-0.14
_ASCII
-0.14
POSITIVE LOGITS
sylvania
0.29
Penn
0.21
penn
0.21
Penn
0.20
iless
0.20
elope
0.20
ovation
0.18
hap
0.17
insula
0.17
ngen
0.17
Activations Density 0.006%