INDEX
Explanations
expressions related to emotions or attitudes of sincerity or dedication
terms related to kindness or warmth versus coldness and cruel behavior
New Auto-Interp
Negative Logits
Downloadha
-0.83
ggies
-0.71
assic
-0.66
ICH
-0.64
andra
-0.64
ICO
-0.64
abases
-0.63
MAT
-0.61
JO
-0.60
Indust
-0.60
POSITIVE LOGITS
hearted
1.32
ness
0.86
heartedly
0.82
tons
0.74
endeavour
0.72
terness
0.70
nesses
0.69
glances
0.68
acters
0.68
altru
0.68
Activations Density 0.006%