INDEX
Explanations
conditional phrases and situations that involve risk or potential outcomes
New Auto-Interp
Negative Logits
undermin
-0.17
каж
-0.17
¦y
-0.15
almost
-0.15
æĿī
-0.15
pekt
-0.15
gne
-0.15
rk
-0.14
almost
-0.14
elib
-0.14
POSITIVE LOGITS
anything
0.28
ever
0.24
/if
0.23
any
0.23
anything
0.23
things
0.22
Anything
0.20
need
0.20
ANY
0.20
needed
0.19
Activations Density 0.147%