INDEX
Explanations
emotions and their expressions, particularly those related to suffering and regret
New Auto-Interp
Negative Logits
utow
-0.17
byt
-0.15
itaire
-0.15
imoto
-0.15
lak
-0.15
Fare
-0.14
ouri
-0.14
nez
-0.14
/sidebar
-0.14
аÑĨи
-0.14
POSITIVE LOGITS
ful
0.98
fully
0.79
full
0.78
FUL
0.71
fulness
0.69
FULL
0.65
-full
0.56
ful
0.52
Full
0.48
eful
0.47
Activations Density 0.079%