INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ’.
    1.03
    .’
    1.02
    ן
    0.95
    0.89
    .</
    0.84
    land
    0.79
    nn
    0.78
    ’।
    0.78
    ’;
    0.78
    ’?
    0.76
    POSITIVE LOGITS
    ور
    0.93
    ва
    0.93
     comfortable
    0.93
     uncomfortable
    0.92
    -
    0.87
    ل
    0.86
    0.82
    /
    0.81
    ди
    0.80
    როს
    0.79
    Act Density 0.021%

    No Known Activations