INDEX
    Explanations

    phrases indicating relationships or connections between entities

    New Auto-Interp
    Negative Logits
     two
    -1.08
    two
    -0.91
     deux
    -0.86
     three
    -0.85
     TWO
    -0.82
     dwóch
    -0.81
     zwei
    -0.78
    TWO
    -0.72
     trois
    -0.72
     THREE
    -0.71
    POSITIVE LOGITS
     brightest
    0.71
     loudest
    0.67
     deadliest
    0.67
     stesso
    0.65
     stessi
    0.64
     mismos
    0.63
     heaviest
    0.63
     terbesar
    0.60
     misma
    0.59
     strongest
    0.59
    Act Density 0.079%

    No Known Activations