INDEX
Explanations
phrases related to goals, achievements, promises, and benefits
terms related to crime and consequences
New Auto-Interp
Negative Logits
stable
-0.78
arling
-0.77
ateurs
-0.74
lip
-0.74
ovy
-0.74
annis
-0.72
untled
-0.68
orthy
-0.67
PACs
-0.66
aples
-0.65
POSITIVE LOGITS
refrain
0.73
imaginable
0.72
ãĥĨãĤ£
0.72
phrase
0.69
spree
0.68
lessly
0.66
IENCE
0.65
lessness
0.64
whereby
0.64
ously
0.63
Activations Density 0.546%