INDEX
    Explanations

    phrases indicating ability or possibility

    New Auto-Interp
    Negative Logits
    Reloaded
    -0.65
     Lug
    -0.61
    erville
    -0.60
    NetMessage
    -0.56
    abin
    -0.56
    hess
    -0.55
    REF
    -0.54
    atti
    -0.54
    alloc
    -0.53
    entin
    -0.53
    POSITIVE LOGITS
     guessed
    1.04
     attest
    0.90
     imagine
    0.88
     guess
    0.82
     infer
    0.81
    seen
    0.77
     doubtless
    0.73
     see
    0.73
     noticing
    0.72
     notice
    0.72
    Act Density 0.041%

    No Known Activations