INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     awesome
    -0.16
    Awesome
    -0.15
    awesome
    -0.15
     Awesome
    -0.12
     cool
    -0.12
    Fuck
    -0.12
     sisters
    -0.11
     fuck
    -0.11
     dude
    -0.11
    fuck
    -0.10
    POSITIVE LOGITS
     young
    0.20
    young
    0.15
     молод
    0.15
     mlad
    0.13
     folks
    0.13
     son
    0.12
     ol
    0.12
     kids
    0.12
     jeune
    0.11
     nonsense
    0.11
    Act Density 0.090%

    No Known Activations