INDEX
Explanations
sentences or statements indicating certainty or confidence
sentences that express conclusions or statements of fact
New Auto-Interp
Negative Logits
tremend
-0.95
gobl
-0.85
challeng
-0.81
preval
-0.77
horrend
-0.76
unsus
-0.76
satell
-0.72
defe
-0.69
millenn
-0.69
princ
-0.69
POSITIVE LOGITS
He
1.50
"[
1.44
Asked
1.37
Speaking
1.36
"(
1.36
"
1.30
However
1.24
Specifically
1.23
"...
1.19
"'
1.18
Activations Density 0.320%