INDEX
Explanations
numerical values or statistics within the text
New Auto-Interp
Negative Logits
ellan
-0.16
ele
-0.16
ilan
-0.15
ron
-0.15
urve
-0.15
Downing
-0.15
erland
-0.14
ignon
-0.14
mi
-0.14
dam
-0.14
POSITIVE LOGITS
Birch
0.16
cente
0.16
laps
0.15
udo
0.14
lys
0.14
.hardware
0.14
stral
0.14
PRS
0.13
cue
0.13
.gs
0.13
Activations Density 0.030%