INDEX
Explanations
mentions of written content, such as documentation or reports
articles such as "a" and "an" in the text
New Auto-Interp
Negative Logits
TDs
-0.72
outweigh
-0.71
colonists
-0.69
weights
-0.69
onto
-0.67
actionGroup
-0.67
combatants
-0.67
plates
-0.66
izons
-0.65
itiz
-0.64
POSITIVE LOGITS
nutshell
1.24
wake
0.84
perverse
0.83
twist
0.83
statement
0.79
brief
0.78
bizarre
0.77
footnote
0.76
flurry
0.75
courtroom
0.74
Activations Density 0.090%