INDEX
    Explanations

    phrases expressing thoughts or reflections, particularly those that are self-critical or clichéd

    New Auto-Interp
    Negative Logits
    ntax
    -0.16
    ustos
    -0.15
    erah
    -0.14
    antz
    -0.14
    482
    -0.14
    виж
    -0.14
    PLUS
    -0.13
    egt
    -0.13
    MLS
    -0.13
    ìĿ´ìĬ¤
    -0.13
    POSITIVE LOGITS
     but
    0.23
     nhưng
    0.18
    but
    0.18
     pero
    0.17
    _but
    0.17
    ï¼Įä½Ĩ
    0.16
    oeff
    0.16
     но
    0.15
    μιÏĥ
    0.15
    ì§Ģë§Į
    0.15
    Act Density 0.089%

    No Known Activations