INDEX
Explanations
sentences where someone is making a statement or providing information
instances of reported speech or statements made by individuals
New Auto-Interp
Negative Logits
estern
-0.80
ILCS
-0.71
famous
-0.70
Synopsis
-0.70
eatured
-0.68
Rated
-0.67
Woman
-0.66
EEE
-0.65
én
-0.65
illet
-0.64
POSITIVE LOGITS
nonetheless
1.08
none
1.06
otherwise
0.96
nothing
0.95
nevertheless
0.92
it
0.92
neither
0.91
there
0.90
doubts
0.84
caution
0.83
Activations Density 0.307%