INDEX
Explanations
variations of the word "Val" or similar patterns, possibly related to names or labels
New Auto-Interp
Negative Logits
elly
-0.19
eling
-0.19
eting
-0.18
elli
-0.17
elson
-0.17
esse
-0.16
idon
-0.16
elle
-0.16
ese
-0.15
eca
-0.15
POSITIVE LOGITS
entine
0.28
entina
0.26
uation
0.24
entin
0.23
leys
0.23
uable
0.23
uations
0.21
val
0.21
ued
0.20
=val
0.20
Activations Density 0.025%