INDEX
Explanations
instances of emotional expressions and social interactions
New Auto-Interp
Negative Logits
/Peak
-0.16
κά
-0.15
egas
-0.14
ACHI
-0.14
humiliation
-0.14
åĭ
-0.14
incare
-0.14
.LoadScene
-0.14
defer
-0.13
urovision
-0.13
POSITIVE LOGITS
complain
0.62
complaint
0.61
complaining
0.58
complaints
0.57
complains
0.52
Complaint
0.51
complained
0.50
lament
0.40
mo
0.38
grip
0.36
Activations Density 0.440%