INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     enlightenment
    0.56
     enlighten
    0.43
    ements
    0.42
    ambarkan
    0.42
     enlight
    0.42
     передви
    0.42
     rhetoric
    0.41
    enseignement
    0.40
    象征
    0.40
    姿
    0.40
    POSITIVE LOGITS
     detachment
    0.64
     detached
    0.56
     detach
    0.56
     sincere
    0.54
     equ
    0.54
     sad
    0.52
     SAD
    0.51
     DET
    0.51
     steady
    0.50
     Equ
    0.48
    Act Density 0.007%

    No Known Activations