INDEX
    Explanations

    disagreements

    New Auto-Interp
    Negative Logits
     Telephone
    -0.07
    PHONE
    -0.07
    tribution
    -0.06
    TestClass
    -0.06
     lawsuits
    -0.06
    никами
    -0.06
    _street
    -0.06
    -0.06
    Workers
    -0.06
    -0.06
    POSITIVE LOGITS
     bağ
    0.07
     snippet
    0.07
     ponto
    0.07
    (hex
    0.06
     `'
    0.06
     محاس
    0.06
    0.06
    0.06
    出来
    0.06
     limite
    0.06
    Act Density 0.006%

    No Known Activations