INDEX
Explanations
words related to satisfaction or displeasure
variations of the word "ple."
New Auto-Interp
Negative Logits
Kenobi
-0.81
DERR
-0.75
ERA
-0.71
Colonial
-0.64
srfAttach
-0.63
eworld
-0.62
clinton
-0.62
Ernst
-0.62
Reviewer
-0.62
rolet
-0.62
POSITIVE LOGITS
teness
1.26
mented
1.02
ple
0.97
plet
0.97
ments
0.93
asure
0.84
asures
0.81
thora
0.81
xt
0.80
asing
0.75
Activations Density 0.017%