INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Lec
    -0.09
     Demokrat
    -0.08
     Gerade
    -0.08
     Así
    -0.08
     печени
    -0.08
    isations
    -0.08
     chansons
    -0.08
     Representation
    -0.08
     Darstellung
    -0.07
     marchand
    -0.07
    POSITIVE LOGITS
    Rich
    0.08
     корот
    0.08
     rich
    0.08
    .Rich
    0.07
    rich
    0.07
     hoe
    0.07
    .rich
    0.07
     brief
    0.07
     cycling
    0.07
    عار
    0.07
    Act Density 0.001%

    No Known Activations