INDEX
    Explanations

    negative or limiting phrases

    New Auto-Interp
    Negative Logits
    çĶļèĩ³
    -0.21
     zwar
    -0.21
     sice
    -0.20
     even
    -0.20
     Even
    -0.18
     but
    -0.18
     tháºŃm
    -0.18
    even
    -0.18
    iola
    -0.16
    èϽçĦ¶
    -0.16
    POSITIVE LOGITS
     necessarily
    0.21
     enough
    0.16
    Traversal
    0.15
    vida
    0.15
    theless
    0.14
    Ïħνα
    0.14
     consequ
    0.14
    essler
    0.14
    ilden
    0.14
    daq
    0.14
    Act Density 0.126%

    No Known Activations