INDEX
    Explanations

    identifying types and categories

    New Auto-Interp
    Negative Logits
     "
    0.48
    U
    0.46
    T
    0.44
    G
    0.44
    ug
    0.42
    =
    0.41
    :
    0.40
    W
    0.40
    com
    0.39
    R
    0.39
    POSITIVE LOGITS
     வரைய
    0.46
    0.46
    yatiti
    0.44
     طی
    0.44
    是谁
    0.44
    پے
    0.44
    0.43
    ٰی
    0.43
    نیا
    0.43
     بخشی
    0.43
    Act Density 0.001%

    No Known Activations