INDEX
Explanations
conditional phrases starting with "if"
New Auto-Interp
Negative Logits
umber
-0.17
venge
-0.16
chedulers
-0.16
asser
-0.15
amber
-0.15
iou
-0.15
our
-0.15
ziel
-0.14
lify
-0.14
vido
-0.14
POSITIVE LOGITS
rames
0.27
rame
0.21
you
0.21
there
0.20
fy
0.19
they
0.19
/how
0.18
not
0.18
tek
0.18
necessary
0.17
Activations Density 0.008%