INDEX
    Explanations

    the word "OK" with a strong activation value

    expressions indicating a sense of approval or acceptance

    New Auto-Interp
    Negative Logits
    urther
    -0.60
    ience
    -0.59
    cum
    -0.59
     guiActiveUn
    -0.58
    Shadow
    -0.57
    eries
    -0.57
    ensis
    -0.55
    cence
    -0.55
    rowth
    -0.55
     latent
    -0.54
    POSITIVE LOGITS
     OK
    3.95
     ok
    2.70
     okay
    2.54
    OK
    2.32
     alright
    2.09
     Okay
    1.81
     Ok
    1.78
    Ok
    1.53
    Okay
    1.47
     Alright
    1.34
    Act Density 0.005%

    No Known Activations