INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     semelhante
    -0.08
     vs
    -0.08
     Between
    -0.07
     ähnlich
    -0.07
     Other
    -0.07
    -0.07
    aa
    -0.07
     Generic
    -0.07
    ban
    -0.07
     resembling
    -0.07
    POSITIVE LOGITS
     implicitly
    0.14
    implicitly
    0.12
     vanzelf
    0.12
     implicit
    0.12
     جيڪڏهن
    0.10
    implicit
    0.10
     automatically
    0.10
    Implicit
    0.10
     prerequisite
    0.10
     gült
    0.10
    Act Density 0.051%

    No Known Activations