INDEX
Explanations
phrases related to negation or prohibition
phrases indicating caution or negativity
New Auto-Interp
Negative Logits
assemblies
-0.72
extracts
-0.70
naires
-0.70
accents
-0.70
itars
-0.70
backgrounds
-0.68
costumes
-0.68
Emails
-0.67
reviews
-0.67
ultras
-0.67
POSITIVE LOGITS
brainer
1.22
starter
1.01
breaker
0.91
breaker
0.88
acea
0.85
kill
0.85
burner
0.82
requisite
0.81
statement
0.80
blow
0.79
Activations Density 0.307%