INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    rook
    -0.56
     unplayable
    -0.52
     unlucky
    -0.49
     antif
    -0.49
     rook
    -0.47
     Marín
    -0.47
     annals
    -0.47
     Rook
    -0.47
    Rook
    -0.46
     octo
    -0.46
    POSITIVE LOGITS
     desire
    1.77
    desire
    1.58
     Desire
    1.54
    Desire
    1.52
     desires
    1.52
     deseo
    1.26
     désir
    1.24
     desiring
    1.22
     desired
    1.20
     desiderio
    1.15
    Act Density 0.013%

    No Known Activations