INDEX
Explanations
moments of high emotional intensity or impactful statements
New Auto-Interp
Negative Logits
ugi
-0.15
atan
-0.14
Salad
-0.14
Tarih
-0.14
ifornia
-0.13
Perl
-0.13
Dul
-0.13
Lâm
-0.13
Dawn
-0.13
ubl
-0.13
POSITIVE LOGITS
rve
0.17
lue
0.16
zburg
0.15
eker
0.15
etros
0.15
fcn
0.15
scand
0.14
ource
0.14
arlar
0.14
asser
0.14
Activations Density 0.061%