INDEX
Explanations
terms related to importance, value, or impact
sentences that mark a conclusion or summary
New Auto-Interp
Negative Logits
reversible
-0.79
nons
-0.75
allowances
-0.74
allowance
-0.73
involuntary
-0.72
unspecified
-0.72
glim
-0.71
emanc
-0.70
handc
-0.69
implied
-0.67
POSITIVE LOGITS
However
1.51
Yet
1.46
Its
1.43
Recently
1.39
Unfortunately
1.37
Nevertheless
1.36
But
1.33
Moreover
1.33
Besides
1.32
Sadly
1.32
Activations Density 0.545%