INDEX
    Explanations

    positive adjectives describing quality or experience

    New Auto-Interp
    Negative Logits
     autorytatywna
    -0.57
    Ligações
    -0.56
     těch
    -0.54
     démission
    -0.49
     diejenigen
    -0.49
     protože
    -0.48
    ambién
    -0.47
     those
    -0.47
     neler
    -0.47
     förr
    -0.47
    POSITIVE LOGITS
     joint
    0.51
    bleau
    0.47
    ,
    0.47
     <<<<<<<<<<<<<<
    0.46
     entire
    0.46
     surla
    0.46
     MainAxisSize
    0.44
     a
    0.44
     whole
    0.43
     face
    0.43
    Act Density 0.781%

    No Known Activations