INDEX
    Explanations

    bold actions or statements

    New Auto-Interp
    Negative Logits
    OTOS
    -0.90
     Cheong
    -0.87
    enfranch
    -0.65
     duly
    -0.64
    ADS
    -0.63
    utra
    -0.63
    AW
    -0.62
    yip
    -0.62
    apolis
    -0.62
    nesota
    -0.61
    POSITIVE LOGITS
    faced
    1.20
    er
    1.09
    ness
    1.05
    face
    0.98
    est
    0.89
    mouth
    0.89
    nesses
    0.84
    ly
    0.82
    bold
    0.81
    word
    0.81
    Act Density 0.028%

    No Known Activations