INDEX
Explanations
inquiries and questions about decision-making or actions to take
New Auto-Interp
Negative Logits
tt
-0.15
already
-0.15
only
-0.14
sei
-0.14
Kop
-0.14
acclaimed
-0.14
Foods
-0.13
forks
-0.13
azo
-0.13
Mayer
-0.13
POSITIVE LOGITS
gain
0.17
doing
0.17
Gain
0.16
gain
0.16
Gain
0.15
do
0.15
doing
0.15
ible
0.15
imler
0.15
etest
0.15
Activations Density 0.085%