INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     autorytatywna
    -0.71
    ValueStyle
    -0.69
    ymce
    -0.65
     للمعارف
    -0.64
    Tikang
    -0.59
     GenerationType
    -0.59
     OFDb
    -0.57
    Hentet
    -0.56
     estekak
    -0.55
     gyhoeddwyd
    -0.54
    POSITIVE LOGITS
     electrode
    1.91
     Electrode
    1.71
     electrodes
    1.68
    electrode
    1.53
     ELECTRO
    1.00
    Electro
    0.90
    electro
    0.81
     Electro
    0.77
     électro
    0.76
    lectro
    0.71
    Act Density 0.010%

    No Known Activations