INDEX
Explanations
words and phrases related to choices and decision-making
New Auto-Interp
Negative Logits
ly
-0.18
sole
-0.18
thy
-0.16
Sole
-0.16
prises
-0.15
ipa
-0.15
hee
-0.15
acio
-0.15
otos
-0.14
rea
-0.14
POSITIVE LOGITS
Made
0.20
made
0.19
fulness
0.18
Made
0.18
aint
0.17
able
0.16
indeki
0.16
y
0.16
841
0.15
lessly
0.15
Activations Density 0.042%