INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ful
    -0.08
     extran
    -0.08
    CCR
    -0.08
     zurück
    -0.08
     darling
    -0.08
    -0.07
     preferable
    -0.07
     corret
    -0.07
     extranj
    -0.07
    Tudo
    -0.07
    POSITIVE LOGITS
     stave
    0.08
    0.08
    made
    0.08
    -hole
    0.08
     constructed
    0.07
     изготов
    0.07
     prost
    0.07
    witch
    0.07
    נים
    0.07
    ointment
    0.07
    Act Density 0.001%

    No Known Activations