INDEX
Explanations
phrases indicating the effectiveness or superiority of actions and decisions
New Auto-Interp
Negative Logits
ieri
-0.17
maal
-0.15
akis
-0.15
ÐŁÐ¾ÑĤ
-0.14
äºķ
-0.14
Pot
-0.14
ç¨ĭ度
-0.14
MAND
-0.14
552
-0.14
زÙĦ
-0.14
POSITIVE LOGITS
course
0.35
bet
0.34
way
0.32
choice
0.32
option
0.29
thing
0.28
course
0.27
bets
0.27
Course
0.27
move
0.27
Activations Density 0.096%