INDEX
    Explanations

    kena, suay, hacked, getroffen

    New Auto-Interp
    Negative Logits
     Hearst
    0.38
     unhealthy
    0.37
     emergent
    0.37
    عبير
    0.37
    чества
    0.36
     শির
    0.36
    ịnh
    0.36
    честве
    0.35
    0.35
    다운
    0.34
    POSITIVE LOGITS
     terken
    0.69
     kena
    0.61
     getroffen
    0.56
     pata
    0.55
    hab
    0.54
     robbed
    0.50
     puk
    0.49
     trampled
    0.49
     hab
    0.48
     geb
    0.48
    Act Density 0.001%

    No Known Activations