INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     be
    -0.08
     are
    -0.08
     were
    -0.07
     attract
    -0.07
    -0.07
    alo
    -0.06
    chin
    -0.06
    V
    -0.06
    VI
    -0.06
     Navigation
    -0.06
    POSITIVE LOGITS
     Does
    0.14
    Does
    0.13
     does
    0.12
    does
    0.11
     DOES
    0.10
    .Does
    0.09
     Doe
    0.09
    0.07
    르게
    0.07
     helps
    0.07
    Act Density 0.029%

    No Known Activations