INDEX
    Explanations

    phrases indicating certainty or emphasis

    instances of negation or refusal in various contexts

    New Auto-Interp
    Negative Logits
     RAD
    -0.51
     guiActiveUnfocused
    -0.51
    creen
    -0.49
     scattering
    -0.48
     shroud
    -0.48
     cottage
    -0.46
     lodging
    -0.46
     Manhattan
    -0.46
     scatter
    -0.46
     shack
    -0.46
    POSITIVE LOGITS
    ¬
    0.82
    £
    0.78
    ¡
    0.77
    ¹
    0.77
    Ĵ
    0.77
    ı
    0.74
    ¼
    0.74
    º
    0.73
    ¢
    0.71
    ł
    0.69
    Act Density 0.516%

    No Known Activations