INDEX
    Explanations

    occurrence count of one

    New Auto-Interp
    Negative Logits
    ing
    0.66
     of
    0.63
     mengatakan
    0.62
    )
    0.62
    िंग
    0.57
    𝗮
    0.57
     metaphors
    0.56
    ठबंधन
    0.55
     that
    0.54
     countries
    0.54
    POSITIVE LOGITS
     Twice
    0.74
     twice
    0.64
    两次
    0.61
    ת
    0.61
    0.60
    ри
    0.60
    ים
    0.60
    twice
    0.59
    خ
    0.59
    для
    0.57
    Act Density 0.008%

    No Known Activations