INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     milf
    -1.18
     fta
    -1.15
     increa
    -1.15
     fortn
    -1.14
     fuf
    -1.14
     strick
    -1.12
     snoopy
    -1.11
     disagre
    -1.11
     apprehen
    -1.10
     »>
    -1.10
    POSITIVE LOGITS
     toy
    1.51
     toys
    1.48
    Toy
    1.41
     Toy
    1.36
    toy
    1.34
     Toys
    1.17
    toys
    1.10
    Toys
    1.04
     TOY
    0.99
    玩具
    0.92
    Act Density 0.090%

    No Known Activations