INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     pollen
    -0.08
     diesem
    -0.08
     foll
    -0.07
    .po
    -0.07
     péri
    -0.06
     pozn
    -0.06
     Bowling
    -0.06
     ppl
    -0.06
     Fri
    -0.06
     Toilet
    -0.06
    POSITIVE LOGITS
     defy
    0.08
    isode
    0.07
     bark
    0.06
    ><?
    0.06
    verbs
    0.06
    ↵
    ↵
    ↵
    0.06
    	↵	↵↵
    0.06
    (input
    0.06
    алось
    0.06
    iod
    0.06
    Act Density 0.003%

    No Known Activations