INDEX
Explanations
phrases related to empathy and understanding the feelings and experiences of others
New Auto-Interp
Negative Logits
erville
-0.17
isk
-0.16
lech
-0.15
aled
-0.14
rones
-0.14
dı
-0.14
217
-0.13
arendra
-0.13
ugh
-0.13
aren
-0.13
POSITIVE LOGITS
therein
0.16
.ix
0.16
Sole
0.15
its
0.15
annya
0.15
ãģĿãģĵ
0.15
thereof
0.15
lemn
0.14
sole
0.14
Gonzalez
0.14
Activations Density 0.278%