INDEX
    Explanations

    its / possessive pronoun

    New Auto-Interp
    Negative Logits
     ۴
    0.93
    ORT
    0.86
    URUK
    0.84
     ۵
    0.83
    ESS
    0.82
    AMBI
    0.82
    0.81
    ALL
    0.79
    USE
    0.78
     ສະ
    0.78
    POSITIVE LOGITS
    the
    1.11
    x
    1.09
     in
    1.08
    w
    1.02
    D
    1.02
    0.99
    d
    0.96
    s
    0.95
    t
    0.95
    r
    0.94
    Act Density 0.002%

    No Known Activations