INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     T
    -0.07
     mythical
    -0.06
    Tags
    -0.06
     Polar
    -0.06
    TOT
    -0.06
     состояния
    -0.06
    ’t
    -0.06
    ूत
    -0.06
    omat
    -0.06
    .ravel
    -0.06
    POSITIVE LOGITS
     he
    0.14
     He
    0.13
    .He
    0.11
     him
    0.11
     she
    0.11
     She
    0.10
    He
    0.10
    -he
    0.09
    HE
    0.09
    he
    0.09
    Act Density 0.444%

    No Known Activations