INDEX
Explanations
complimentary statements or expressions of admiration
expressions of opinions or perceptions regarding personal beauty or attractiveness
New Auto-Interp
Negative Logits
.<
-0.78
!.
-0.77
+.
-0.72
$.
-0.70
."[
-0.67
%.
-0.64
.</
-0.63
.","
-0.62
.''.
-0.62
.""
-0.62
POSITIVE LOGITS
)]
0.73
)]
0.61
-)
0.56
?)
0.55
?),
0.55
oneliness
0.54
?)
0.54
ado
0.54
iru
0.52
infall
0.51
Activations Density 1.721%