INDEX
Explanations
expressions of emotional experiences and reflections
New Auto-Interp
Negative Logits
attract
-0.18
attracted
-0.15
ple
-0.15
Shapes
-0.15
uite
-0.15
avou
-0.15
Ñĥдив
-0.14
unp
-0.14
à¤Ĩà¤ķर
-0.14
attractiveness
-0.14
POSITIVE LOGITS
leave
0.22
send
0.21
left
0.20
sends
0.19
leaving
0.19
leave
0.19
leaves
0.18
left
0.18
Send
0.18
Send
0.18
Activations Density 0.124%