INDEX
Explanations
themes of kindness and helping others
New Auto-Interp
Negative Logits
radial
-0.15
SHA
-0.15
terior
-0.14
ÙĬز
-0.14
ÑĢед
-0.14
.Rad
-0.13
ere
-0.13
contempt
-0.13
013
-0.13
eter
-0.13
POSITIVE LOGITS
lon
0.19
lements
0.18
uong
0.16
ãĥŃãĥ³
0.16
ilden
0.16
_lon
0.16
tik
0.15
aidu
0.15
еÑĢÑĤи
0.14
.ravel
0.14
Activations Density 0.468%