INDEX
    Explanations

    article followed by noun

    New Auto-Interp
    Negative Logits
    in
    1.58
    1.46
    ل
    1.18
    1.02
    es
    0.99
    ан
    0.95
    ir
    0.92
    at
    0.91
    is
    0.90
    isches
    0.90
    POSITIVE LOGITS
     
    1.06
    0.79
    ^{-}
    0.71
    0.70
    ן
    0.70
     is
    0.67
     т
    0.67
    0.66
     t
    0.65
    0.65
    Act Density 1.161%

    No Known Activations