INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Examples
    -0.07
     equipe
    -0.07
    тою
    -0.07
     compound
    -0.07
     проте
    -0.06
    .Index
    -0.06
    oteca
    -0.06
    groupon
    -0.06
    -0.06
     nhánh
    -0.06
    POSITIVE LOGITS
     wars
    0.08
     الحرب
    0.07
     war
    0.06
     War
    0.06
    _CLICKED
    0.06
    .window
    0.06
    _fecha
    0.06
    (clicked
    0.06
    nb
    0.06
    _OPT
    0.06
    Act Density 0.015%

    No Known Activations