INDEX
    Explanations

    place names that are substrings

    New Auto-Interp
    Negative Logits
    א
    0.54
    на
    0.47
    trashItem
    0.46
     וא
    0.45
    on
    0.43
    ج
    0.43
    ik
    0.41
    نا
    0.41
    墓志
    0.40
    ח
    0.40
    POSITIVE LOGITS
    '
    0.55
    0.43
    0.40
     to
    0.40
    ę
    0.39
    0.39
     soothing
    0.38
    0.38
    0.37
    0.36
    Act Density 0.052%

    No Known Activations