INDEX
Explanations
expressions of assurance, intention, and conditions related to events or decisions
New Auto-Interp
Negative Logits
tdown
-0.16
urette
-0.16
æ¬ł
-0.15
enÃŃ
-0.15
illez
-0.15
uhan
-0.15
омен
-0.15
UNKNOWN
-0.14
Photon
-0.14
isay
-0.14
POSITIVE LOGITS
won
1.00
won
0.90
Won
0.89
Won
0.82
WON
0.60
wont
0.59
wouldn
0.56
ä¸įä¼ļ
0.50
Wouldn
0.43
unlikely
0.43
Activations Density 0.372%