INDEX
    Explanations

    personal pronouns

    New Auto-Interp
    Negative Logits
     build
    -0.06
     actor
    -0.06
    انون
    -0.06
     bright
    -0.06
    лав
    -0.06
     Blue
    -0.06
    Window
    -0.06
    accuracy
    -0.06
    110
    -0.06
    -0.06
    POSITIVE LOGITS
    _Format
    0.07
    istra
    0.07
     Vill
    0.06
     Cyril
    0.06
    :'',
    0.06
    0.06
    ونية
    0.06
     moder
    0.06
     नई
    0.06
     Encoder
    0.06
    Act Density 0.113%

    No Known Activations