INDEX
    Explanations

    phrases related to discussion or debate

    New Auto-Interp
    Negative Logits
     inappro
    -0.84
     deff
    -0.77
     fuf
    -0.77
     increa
    -0.76
     purcha
    -0.76
     iirc
    -0.76
     berea
    -0.75
     attemp
    -0.74
     Lmao
    -0.74
     Wtf
    -0.73
    POSITIVE LOGITS
    ?}
    1.26
    ?</
    1.26
    ?
    1.25
    ?
    
    1.25
    ?”
    1.24
    ?");
    1.23
    ?’
    1.23
    ؟
    1.20
    }?
    1.18
    ?"
    1.17
    Act Density 0.638%

    No Known Activations