INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Gilbert
    -0.08
     Naut
    -0.08
     Hockey
    -0.08
     Isa
    -0.07
     Ferrari
    -0.07
     Idee
    -0.07
     Idea
    -0.07
    apsed
    -0.07
    obt
    -0.07
    udar
    -0.07
    POSITIVE LOGITS
    0.08
     esm
    0.08
    0.07
     honestly
    0.07
    0.07
    程度
    0.07
     gifted
    0.07
    0.07
     Schwartz
    0.07
    0.07
    Act Density 0.001%

    No Known Activations