INDEX
    Explanations

    questions and negative statements

    New Auto-Interp
    Negative Logits
    antMatchers
    -1.12
     שוליים
    -0.99
     itſelf
    -0.88
    Composable
    -0.88
     Rhonda
    -0.88
     Grau
    -0.88
    InputBorder
    -0.86
     Houſe
    -0.86
     Ise
    -0.86
     houſe
    -0.85
    POSITIVE LOGITS
    did
    1.73
     Did
    1.70
     did
    1.67
     DID
    1.64
    Did
    1.54
    DID
    1.48
     Didi
    1.21
     didn
    1.04
     Didn
    1.01
     done
    0.98
    Act Density 0.091%

    No Known Activations