INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    I
    0.59
     I
    0.54
     a
    0.53
    A
    0.50
    gelijk
    0.49
     Dorchester
    0.46
    0.44
     Please
    0.44
    pathetic
    0.44
     A
    0.43
    POSITIVE LOGITS
    י
    0.57
    ва
    0.53
    ן
    0.52
    ي
    0.51
    ٹ
    0.50
    0.49
    ين
    0.49
    িল
    0.49
    λο
    0.47
    াদেশ
    0.47
    Act Density 0.042%

    No Known Activations