INDEX
Explanations
phrases indicating alternative options or conditions
New Auto-Interp
Negative Logits
rup
-0.17
aji
-0.16
Depths
-0.15
elan
-0.15
anye
-0.15
icot
-0.14
PTY
-0.14
antz
-0.14
ijd
-0.14
ReuseIdentifier
-0.14
POSITIVE LOGITS
soon
0.21
soon
0.20
Soon
0.17
close
0.17
dummy
0.16
Bare
0.15
closely
0.15
proxy
0.15
near
0.15
aqu
0.15
Activations Density 0.099%