INDEX
Explanations
sentences describing situations or events where things are going well or fine
positive sentiments and well-being indicators
New Auto-Interp
Negative Logits
incest
-0.74
Sacrifice
-0.71
jealousy
-0.70
scorn
-0.70
betrayal
-0.70
hypocrisy
-0.69
stakes
-0.68
Spons
-0.66
Vide
-0.65
Insp
-0.65
POSITIVE LOGITS
alright
1.23
satisfactory
1.16
stabilized
1.13
calmed
1.11
safely
1.07
okay
1.04
healed
1.03
ok
1.02
satisf
1.02
stable
1.01
Activations Density 0.846%