INDEX
Explanations
instances of the word "wording"
references to wording and phrasing in a document
New Auto-Interp
Negative Logits
elsen
-0.80
ept
-0.74
earned
-0.71
ald
-0.70
cot
-0.70
Deg
-0.69
ersen
-0.69
Democr
-0.68
oy
-0.68
ricular
-0.67
POSITIVE LOGITS
wording
1.30
guiActiveUn
1.05
terday
0.80
instructions
0.75
typo
0.74
text
0.72
uggest
0.72
spelling
0.71
formulation
0.69
greeting
0.68
Activations Density 0.010%