INDEX
    Explanations

    prepositions with specific words

    New Auto-Interp
    Negative Logits
     rediscovered
    0.48
     devenue
    0.45
    습니다
    0.42
     characters
    0.41
     effectivement
    0.41
     Pferde
    0.41
     personajes
    0.40
     کردار
    0.40
     égard
    0.40
    ходит
    0.39
    POSITIVE LOGITS
     from
    0.84
     with
    0.79
     by
    0.77
     in
    0.77
     في
    0.76
     via
    0.72
     through
    0.72
     on
    0.71
     على
    0.69
     från
    0.67
    Act Density 0.019%

    No Known Activations