INDEX
    Explanations

    young animals

    New Auto-Interp
    Negative Logits
     gays
    -0.07
     jokes
    -0.07
    Boston
    -0.07
    Coder
    -0.07
    Ny
    -0.06
     scooter
    -0.06
     hos
    -0.06
     smoke
    -0.06
     ignorance
    -0.06
    /access
    -0.06
    POSITIVE LOGITS
    ี้
    0.06
    (hero
    0.06
     kitten
    0.06
     Lightning
    0.06
    .elements
    0.06
    /mit
    0.06
     Puppy
    0.06
    .password
    0.06
     partial
    0.06
    /thread
    0.06
    Act Density 0.010%

    No Known Activations