INDEX
Explanations
mentions of choices or alternatives
New Auto-Interp
Negative Logits
roll
-0.16
urgeon
-0.16
<Option
-0.16
oras
-0.15
foon
-0.15
orer
-0.15
<dyn
-0.14
าะ
-0.14
lesi
-0.14
_optional
-0.14
POSITIVE LOGITS
ality
0.29
nal
0.24
als
0.22
ally
0.21
nel
0.20
available
0.19
ning
0.19
ALLY
0.18
nement
0.17
:selected
0.17
Activations Density 0.061%