INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Neil
    -0.07
     quam
    -0.06
     chess
    -0.06
     Leo
    -0.06
    .');
    ↵
    -0.06
     thứ
    -0.06
     Однак
    -0.06
    Cars
    -0.06
     Ted
    -0.06
    Fed
    -0.05
    POSITIVE LOGITS
    BOR
    0.07
    activ
    0.06
     bibli
    0.06
    DEVICE
    0.06
    .cd
    0.06
    JNI
    0.06
     καλύ
    0.06
    _include
    0.06
    ительное
    0.06
    ombine
    0.06
    Act Density 0.003%

    No Known Activations