INDEX
    Explanations

    instances where things are being prevented or prohibited

    instances of the word "stop" and its variations in various contexts

    New Auto-Interp
    Negative Logits
    Sov
    -0.86
    iosyncr
    -0.80
    ammy
    -0.78
    ighth
    -0.77
    ographies
    -0.76
    orth
    -0.76
    orthy
    -0.75
    ramid
    -0.75
    olesc
    -0.70
    ocene
    -0.70
    POSITIVE LOGITS
     bothering
    0.93
    gap
    0.91
     raining
    0.87
     bleeding
    0.83
     smoking
    0.83
    watching
    0.78
    watch
    0.77
     worrying
    0.76
     cheating
    0.75
     trafficking
    0.74
    Act Density 0.038%

    No Known Activations