INDEX
    Explanations

    expressions of personal experience and introspection

    New Auto-Interp
    Negative Logits
    seen
    -0.16
     barley
    -0.15
    bout
    -0.15
    elige
    -0.14
    ounge
    -0.14
    нÑĮо
    -0.14
    utin
    -0.14
    ognition
    -0.14
    apg
    -0.14
     Enlight
    -0.14
    POSITIVE LOGITS
     suspect
    0.21
     dim
    0.18
     Sus
    0.17
    annis
    0.17
     worry
    0.17
     increasingly
    0.17
     privilege
    0.17
     lux
    0.17
     habit
    0.16
     contextual
    0.16
    Act Density 0.420%

    No Known Activations