INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     pronunciation
    -0.10
    anford
    -0.10
     meaning
    -0.09
    ERIC
    -0.09
    ORB
    -0.09
     Speaker
    -0.09
     mis
    -0.08
     succinct
    -0.08
    SCO
    -0.08
     Oriental
    -0.08
    POSITIVE LOGITS
     plain
    0.17
     Plain
    0.14
     convers
    0.14
     style
    0.13
     third
    0.12
     neutral
    0.12
    plain
    0.12
     leg
    0.11
     formal
    0.11
     styles
    0.11
    Act Density 0.233%

    No Known Activations