INDEX
Explanations
statements or notes related to information or updates
assertive statements and personal opinions
New Auto-Interp
Negative Logits
tnc
-0.78
rouse
-0.71
anooga
-0.70
manageable
-0.66
ieu
-0.66
rift
-0.64
overwhelm
-0.64
icking
-0.64
ickers
-0.63
akable
-0.63
POSITIVE LOGITS
NEVER
1.35
ALSO
1.29
actually
1.18
ONLY
1.18
ALWAYS
1.11
DID
1.05
NOT
1.04
REALLY
0.98
DOES
0.98
never
0.90
Activations Density 0.519%