INDEX
Explanations
phrases starting with "We"
collective pronouns indicating group actions or sentiments
New Auto-Interp
Negative Logits
UD
-0.72
gratification
-0.67
totality
-0.67
LSD
-0.66
cum
-0.65
quo
-0.65
otherwise
-0.61
srfAttach
-0.61
resting
-0.59
Wikipedia
-0.57
POSITIVE LOGITS
ldon
1.21
bley
1.18
eks
1.16
ird
1.12
akening
1.08
bsite
1.08
alth
1.07
've
1.07
zbollah
1.06
're
1.04
Activations Density 0.091%