INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    21
    -0.08
    0
    -0.08
    20
    -0.07
    24
    -0.07
    into
    -0.07
    attro
    -0.07
    -0.07
    2
    -0.07
    اى
    -0.06
    61
    -0.06
    POSITIVE LOGITS
     Ger
    0.18
     ger
    0.15
    Ger
    0.14
     Hen
    0.10
     GER
    0.10
    ger
    0.09
     jer
    0.08
     Jer
    0.08
     Germ
    0.08
     Gent
    0.08
    Act Density 0.009%

    No Known Activations