INDEX
Explanations
phrases related to consequences and decision-making
New Auto-Interp
Negative Logits
ported
-0.87
abled
-0.83
awaited
-0.83
planned
-0.79
scheduled
-0.79
SPONSORED
-0.77
officially
-0.77
inally
-0.74
completed
-0.73
compliant
-0.73
POSITIVE LOGITS
Sometimes
1.52
Especially
1.51
Often
1.31
Hence
1.28
Therefore
1.27
Usually
1.24
Unless
1.22
Luckily
1.20
Anything
1.20
Whereas
1.17
Activations Density 0.480%