INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    h
    0.94
    خ
    0.90
    p
    0.83
     در
    0.82
     in
    0.78
    0.73
    r
    0.71
    inu
    0.64
     \
    0.61
    ح
    0.61
    POSITIVE LOGITS
    '
    0.90
    ="
    0.84
    ные
    0.79
    0.79
    us
    0.79
    ना
    0.76
    the
    0.76
     resistenza
    0.76
    ção
    0.75
     Réponses
    0.75
    Act Density 0.013%

    No Known Activations