INDEX
    Explanations

    words related to physical discomfort or harm

    the word "from" used in various contexts

    New Auto-Interp
    Negative Logits
    ratulations
    -0.76
    sic
    -0.70
    ierrez
    -0.69
    sat
    -0.69
    iddles
    -0.68
    uri
    -0.68
    busters
    -0.67
    unes
    -0.67
    isode
    -0.65
    trump
    -0.64
    POSITIVE LOGITS
     afar
    1.57
     whence
    1.17
     anywhere
    0.90
     thence
    0.88
     elsewhere
    0.87
     everywhere
    0.87
     inside
    0.84
     wherever
    0.84
     somewhere
    0.83
     across
    0.81
    Act Density 0.155%

    No Known Activations