INDEX
    Explanations

    words related to actions or concepts that involve deterring, regulating, or deciding on something

    words related to prevention or deterrence

    New Auto-Interp
    Negative Logits
     appropri
    -0.60
     sucks
    -0.58
     viz
    -0.57
     HAL
    -0.56
     coworkers
    -0.55
     Haz
    -0.55
    Posts
    -0.54
     quirks
    -0.54
     subordinates
    -0.54
     seams
    -0.54
    POSITIVE LOGITS
    red
    1.67
    ered
    1.58
    ring
    1.52
    rer
    1.49
    mented
    1.44
    ted
    1.39
    ering
    1.37
    ance
    1.37
    anced
    1.35
    ant
    1.34
    Act Density 0.439%

    No Known Activations