INDEX
    Explanations

    words related to observation and awareness

    New Auto-Interp
    Negative Logits
    isle
    -0.18
    oust
    -0.16
    utex
    -0.16
    lers
    -0.16
    lfw
    -0.15
    -legged
    -0.14
     Andrews
    -0.14
    >NN
    -0.14
    ãģ°
    -0.14
    /devices
    -0.14
    POSITIVE LOGITS
    vation
    0.21
    å¯Ł
    0.21
    asion
    0.18
    (obs
    0.18
    ably
    0.17
    235
    0.17
    ãĥ¥
    0.17
    ances
    0.17
    ant
    0.16
    yer
    0.16
    Act Density 0.024%

    No Known Activations