INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     outras
    -0.06
    ildren
    -0.06
     лег
    -0.06
    еся
    -0.06
     raining
    -0.06
    -0.06
    -0.06
     Dale
    -0.05
    �行
    -0.05
     شهرد
    -0.05
    POSITIVE LOGITS
     WHETHER
    0.13
    (store
    0.06
    _me
    0.06
    compiler
    0.06
    erspective
    0.06
    athom
    0.06
    ,set
    0.06
     McCartney
    0.06
     ngăn
    0.06
    inqu
    0.06
    Act Density 0.000%

    No Known Activations