INDEX
Explanations
names that start with "Ly" followed by a single digit
New Auto-Interp
Negative Logits
DERR
-0.77
sburgh
-0.68
raints
-0.64
UID
-0.60
EDITION
-0.59
ardless
-0.59
perture
-0.58
Boards
-0.57
shots
-0.57
urities
-0.56
POSITIVE LOGITS
onna
1.08
nda
1.07
nton
1.02
comed
1.00
rics
0.98
ric
0.97
rique
0.96
ttle
0.94
onel
0.92
mp
0.92
Activations Density 0.019%