INDEX
Explanations
phrases related to efforts or attempts at actions
actions or attempts to do something
New Auto-Interp
Negative Logits
respected
-0.60
Done
-0.58
Reserved
-0.56
krit
-0.56
reserved
-0.55
theless
-0.52
aming
-0.52
oos
-0.52
Balanced
-0.51
Ħ¢
-0.51
POSITIVE LOGITS
to
0.99
unsuccessfully
0.99
to
0.75
aband
0.68
vain
0.66
endra
0.64
airo
0.62
vernment
0.62
ļéĨĴ
0.61
assassinate
0.61
Activations Density 0.088%