INDEX
Explanations
expressions of humor or lightheartedness
New Auto-Interp
Negative Logits
)";
-0.80
-0.77
"],
-0.75
:
-0.72
"},
-0.71
.",
-0.70
'>
-0.70
Revenir
-0.69
"),
-0.68
)");
-0.67
POSITIVE LOGITS
:)
0.93
:).
0.76
:-)
0.76
;)
0.70
🙂
0.69
Vaya
0.68
phenotypes
0.68
:)
0.65
:))
0.64
;-)
0.64
Activations Density 0.080%