INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    (com
    -0.07
     Geschichte
    -0.07
    pire
    -0.06
    blo
    -0.06
    que
    -0.06
    oble
    -0.06
    qe
    -0.06
    elle
    -0.06
    ئ
    -0.06
    ombres
    -0.06
    POSITIVE LOGITS
    Handler
    0.08
     handler
    0.08
     vendor
    0.07
     agent
    0.07
    -h
    0.07
     agents
    0.07
     Hart
    0.07
     الأخ
    0.07
    .processor
    0.07
    ث
    0.07
    Act Density 0.006%

    No Known Activations