INDEX
    Explanations

    the presence of the substring "Tr" within words

    New Auto-Interp
    Negative Logits
    ozÃŃ
    -0.20
    iap
    -0.18
    yh
    -0.18
    yd
    -0.17
    iem
    -0.17
    enko
    -0.16
    eners
    -0.16
    aimassage
    -0.16
    oit
    -0.15
    ertas
    -0.15
    POSITIVE LOGITS
    acy
    0.30
    inity
    0.29
    avis
    0.29
    inidad
    0.29
    ailer
    0.28
    usted
    0.27
    istan
    0.27
    actor
    0.27
    udeau
    0.26
    ained
    0.25
    Act Density 0.011%

    No Known Activations