INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ка
    1.14
    ك
    0.98
    at
    0.86
    op
    0.83
    as
    0.74
    да
    0.71
    il
    0.70
     в
    0.67
     in
    0.65
    0.65
    POSITIVE LOGITS
     been
    1.36
    ע
    1.12
     BEEN
    1.10
     has
    0.95
     Been
    0.95
    been
    0.94
    Been
    0.87
     
    0.84
    ס
    0.79
    س
    0.79
    Act Density 0.804%

    No Known Activations