INDEX
Explanations
declarations or official remarks made in text
the word "statement."
New Auto-Interp
Negative Logits
avorite
-0.78
elsius
-0.77
cffff
-0.73
skill
-0.71
animate
-0.68
upiter
-0.68
theless
-0.66
incumb
-0.65
vag
-0.65
otin
-0.64
POSITIVE LOGITS
statement
1.05
emailed
0.98
announcing
0.91
issued
0.88
Statement
0.86
released
0.81
apologizing
0.76
acknowledging
0.73
atures
0.73
thanking
0.73
Activations Density 0.032%