INDEX
    Explanations

    phrases indicating involvement or participation in actions or events

    New Auto-Interp
    Negative Logits
    urat
    -0.17
    ica
    -0.15
    _apply
    -0.15
    770
    -0.14
    olia
    -0.14
    anje
    -0.14
     Dre
    -0.14
    opo
    -0.14
     <+
    -0.14
    arn
    -0.14
    POSITIVE LOGITS
     by
    0.23
     bợi
    0.20
     oleh
    0.19
    lung
    0.16
     تÙĪØ³Ø·
    0.15
    zes
    0.14
    *)_
    0.14
    rung
    0.14
    MPI
    0.14
     layer
    0.14
    Act Density 0.337%

    No Known Activations