INDEX
    Explanations

    a variety of words related to observation or surveillance

    New Auto-Interp
    Negative Logits
    ãĥ´
    -0.79
    esan
    -0.72
    enture
    -0.70
    bably
    -0.69
    eno
    -0.67
    lishes
    -0.66
    xual
    -0.64
    afort
    -0.64
    ãĤ¨ãĥ«
    -0.63
    wealth
    -0.62
    POSITIVE LOGITS
    dog
    1.32
    tower
    1.26
    dogs
    1.26
     Watching
    1.09
     helpless
    0.95
     watching
    0.95
     attent
    0.89
    watch
    0.89
     closely
    0.88
    opes
    0.88
    Act Density 1.672%

    No Known Activations