INDEX
Explanations
hearts (♥) followed by characters in a unique format
instances of potential negative outcomes or consequences
New Auto-Interp
Negative Logits
estranged
-0.64
illac
-0.62
honoured
-0.61
orchestr
-0.60
unaff
-0.60
Classics
-0.60
extrad
-0.59
Compass
-0.58
diving
-0.58
orche
-0.58
POSITIVE LOGITS
³³³³³³³³
1.25
Posted
1.24
³³³³
1.20
Anonymous
1.18
³³³
1.15
³³³³³³³³³³³³³³³³
1.10
posted
1.04
Anyway
1.02
³³
0.97
Rated
0.95
Activations Density 0.477%