INDEX
    Explanations

    the presence of the word "exist" and its variations in various contexts

    New Auto-Interp
    Negative Logits
    s
    -0.70
    ebs
    -0.52
    _{-\
    -0.52
     WS
    -0.52
    ws
    -0.52
     Daniels
    -0.49
    EDES
    -0.47
     Autos
    -0.47
     כס
    -0.46
    gras
    -0.46
    POSITIVE LOGITS
    contain
    0.78
    Contain
    0.74
    exist
    0.66
    depend
    0.65
     depend
    0.65
     Contain
    0.65
     contain
    0.64
    Depend
    0.63
    Exist
    0.62
     Depend
    0.57
    Act Density 0.024%

    No Known Activations