INDEX
Explanations
phrases indicating inability or restrictions
New Auto-Interp
Negative Logits
-1.16
");
-1.07
-0.99
";
-0.98
"};
-0.94
]--;
-0.93
…..
-0.90
");
-0.88
/>";
-0.88
”;
-0.87
POSITIVE LOGITS
♪
1.11
♪
0.98
Mm
0.82
gonna
0.82
Uh
0.71
GONNA
0.70
ս
0.70
somethin
0.69
Mm
0.68
♪♪
0.65
Activations Density 0.067%