INDEX
Explanations
words and phrases related to sympathy and compassion
New Auto-Interp
Negative Logits
yon
-0.18
erness
-0.16
nun
-0.16
mund
-0.16
ye
-0.16
hips
-0.16
wick
-0.16
cher
-0.16
party
-0.15
ÃŃt
-0.15
POSITIVE LOGITS
ically
0.27
posium
0.24
osate
0.22
indrical
0.20
ical
0.19
atik
0.19
dney
0.19
draul
0.18
soever
0.18
etically
0.17
Activations Density 0.110%