INDEX
    Explanations

    words that convey assertiveness or confidence

    New Auto-Interp
    Negative Logits
    rete
    -0.16
    ever
    -0.15
    vez
    -0.15
    èµĦæĸĻ
    -0.15
    ctor
    -0.15
    acro
    -0.15
    ÏĦαν
    -0.15
    plode
    -0.15
    htable
    -0.14
    trinsic
    -0.14
    POSITIVE LOGITS
    ness
    0.33
    face
    0.28
    -faced
    0.25
    -face
    0.24
    ly
    0.23
     enough
    0.22
    ened
    0.21
    speaker
    0.20
     faced
    0.20
    symbol
    0.20
    Act Density 0.026%

    No Known Activations