INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     raging
    -0.07
     dost
    -0.07
    ATT
    -0.07
     Murphy
    -0.07
     ters
    -0.07
     suff
    -0.07
     impatient
    -0.07
     curtains
    -0.07
     glossy
    -0.06
    arty
    -0.06
    POSITIVE LOGITS
     bi
    0.17
     Bi
    0.15
     Bio
    0.14
    Bi
    0.13
     bio
    0.12
    Bio
    0.11
     би
    0.11
     BI
    0.10
     Би
    0.10
     biod
    0.10
    Act Density 0.037%

    No Known Activations