INDEX
Explanations
accounts or statements made by spokespersons
the word "spokesman" and its variations in reporting contexts
New Auto-Interp
Negative Logits
edu
-0.77
Reward
-0.73
gorge
-0.72
notations
-0.71
llah
-0.70
repaid
-0.69
ptions
-0.67
venants
-0.65
ibe
-0.64
hu
-0.64
POSITIVE LOGITS
bidden
0.96
instance
0.82
example
0.81
gotten
0.80
STATS
0.78
cing
0.78
managing
0.78
cers
0.77
Sierra
0.76
defending
0.75
Activations Density 0.105%