INDEX
    Explanations

    conceptual abstraction and suffixes

    New Auto-Interp
    Negative Logits
    0.59
    you
    0.51
     
    0.50
    ين
    0.44
    ،
    0.44
    0.43
    0.43
     (),
    0.42
    ما
    0.41
    ,
    0.40
    POSITIVE LOGITS
    and
    0.58
    その他の
    0.52
    ud
    0.52
    4
    0.51
    0.51
     других
    0.51
    2
    0.49
     τους
    0.48
     altre
    0.47
    5
    0.47
    Act Density 7.017%

    No Known Activations