INDEX
    Explanations

    car followed by context

    New Auto-Interp
    Negative Logits
    ي
    1.68
    יא
    1.66
    1.52
    އ
    1.52
    1.42
    1.41
    ج
    1.39
    ك
    1.39
    ق
    1.38
    י
    1.38
    POSITIVE LOGITS
    ong
    1.30
    ari
    1.23
    í
    1.21
    ized
    1.20
    ier
    1.18
    ig
    1.16
    ak
    1.16
    r
    1.12
    2
    1.12
    ter
    1.11
    Act Density 0.032%

    No Known Activations