INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ion
    0.75
    ;
    0.73
    ियां
    0.73
    at
    0.71
    ka
    0.71
    ig
    0.68
    nte
    0.67
    #,
    0.67
    ago
    0.67
    ach
    0.66
    POSITIVE LOGITS
    ه
    1.02
    س
    0.98
    на
    0.97
    0.89
    я
    0.88
    то
    0.86
    0.86
    0.85
    ли
    0.84
    atasaray
    0.82
    Act Density 0.004%

    No Known Activations