INDEX
    Explanations

    phrases indicating inability or restrictions

    New Auto-Interp
    Negative Logits
      
    -1.16
    ");
    -1.07
       
    -0.99
    ";
    -0.98
    "};
    -0.94
    ]--;
    -0.93
     …..
    -0.90
     ");
    -0.88
     />";
    -0.88
    ”;
    -0.87
    POSITIVE LOGITS
     ♪
    1.11
    0.98
     Mm
    0.82
     gonna
    0.82
     Uh
    0.71
     GONNA
    0.70
    ս
    0.70
     somethin
    0.69
    Mm
    0.68
    ♪♪
    0.65
    Act Density 0.067%

    No Known Activations