INDEX
Explanations
words and phrases that indicate implicitness or nuance, particularly in discussions of mortality and health hazards
New Auto-Interp
Negative Logits
للمعارف
-0.61
Билгалдахарш
-0.60
$_['
-0.57
endwhile
-0.54
✭✭
-0.54
IntoConstraints
-0.54
ConstraintMaker
-0.54
transQ
-0.53
uxxxx
-0.52
Personensuche
-0.52
POSITIVE LOGITS
implicit
0.61
thank
0.53
risks
0.52
grateful
0.50
moderate
0.49
hazards
0.48
gratitude
0.48
thanking
0.48
risk
0.48
thanked
0.47
Activations Density 1.873%