INDEX
Explanations
questions starting with "why" related to the necessity or importance of various topics or actions
New Auto-Interp
Negative Logits
avou
-0.21
phinx
-0.16
uchi
-0.15
UDO
-0.15
icolor
-0.14
ovid
-0.14
ored
-0.14
ivar
-0.13
cola
-0.13
iek
-0.13
POSITIVE LOGITS
bother
0.17
Should
0.17
should
0.17
emmel
0.17
Choose
0.17
should
0.16
Consider
0.16
choose
0.16
Should
0.16
nên
0.15
Activations Density 0.040%