INDEX
    Explanations

    proper nouns and technical terms

    New Auto-Interp
    Negative Logits
    s
    1.09
     it
    0.95
     (
    0.79
     a
    0.75
    igating
    0.75
    :
    0.72
     idiots
    0.71
    س
    0.70
     gasped
    0.70
     finisher
    0.68
    POSITIVE LOGITS
    ן
    1.04
    ской
    0.83
    ת
    0.82
    ν
    0.81
    0.80
    0.79
    0.79
    ında
    0.77
    ised
    0.75
    0.73
    Act Density 0.030%

    No Known Activations