INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     taken
    -1.71
    taken
    -1.50
     Taken
    -1.39
     TAKEN
    -1.37
    Taken
    -1.34
     took
    -1.18
     tomada
    -0.98
    took
    -0.98
     diambil
    -0.98
     Took
    -0.95
    POSITIVE LOGITS
     advantage
    0.69
     off
    0.66
     place
    0.59
     particular
    0.56
     out
    0.56
    ed
    0.54
     nearly
    0.54
    Източници
    0.52
    ths
    0.52
    obacteria
    0.52
    Act Density 0.039%

    No Known Activations