INDEX
Explanations
terms related to excessive behavior or self-gratification
terms related to indulgence and arbitration
New Auto-Interp
Negative Logits
è¦ļéĨĴ
-0.72
Doctrine
-0.70
Biological
-0.69
Shack
-0.68
Murd
-0.67
Doe
-0.66
IELD
-0.64
Butterfly
-0.64
beard
-0.64
Waterloo
-0.62
POSITIVE LOGITS
rative
1.17
rator
1.08
ging
1.07
ration
1.05
acy
0.96
ates
0.96
gent
0.95
iments
0.92
rators
0.91
gregation
0.91
Activations Density 0.043%