INDEX
    Explanations

    phrases that highlight significant social or health-related messages

    New Auto-Interp
    Negative Logits
    idla
    -0.14
     [â̦
    -0.14
     //~
    -0.14
    mekte
    -0.14
    ~/
    -0.13
     Duffy
    -0.13
     Franken
    -0.13
    ffer
    -0.13
    lf
    -0.13
     Grove
    -0.13
    POSITIVE LOGITS
    à¥ĩà¤Ĥ↵
    0.18
    %)↵
    0.15
    .intellij
    0.14
     ¶
    0.14
    rieve
    0.13
    elopment
    0.13
    %)↵↵
    0.13
    arma
    0.13
    Ì
    0.12
    ↵
    0.12
    Act Density 0.123%

    No Known Activations