INDEX
Explanations
references to negative events or actions
the repeated use of the word "the" in context
New Auto-Interp
Negative Logits
again
-0.78
aja
-0.73
achus
-0.71
instead
-0.67
worn
-0.66
whilst
-0.66
ply
-0.65
ache
-0.65
tle
-0.65
whenever
-0.64
POSITIVE LOGITS
aforementioned
1.22
latter
1.20
ses
1.06
same
1.05
slightest
1.00
greatest
1.00
latest
0.99
Clintons
0.95
entirety
0.90
respective
0.89
Activations Density 0.666%