INDEX
Explanations
highly emotive or impactful words
occurrences of the word "words."
New Auto-Interp
Negative Logits
roller
-0.75
ania
-0.68
cade
-0.68
iday
-0.66
farm
-0.64
involved
-0.64
TOR
-0.64
aper
-0.64
iary
-0.63
availability
-0.63
POSITIVE LOGITS
words
3.85
Words
2.71
words
2.32
Words
2.07
word
2.04
phrases
1.99
word
1.83
phrase
1.76
adject
1.70
verbs
1.55
Activations Density 0.026%