INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     word
    -2.05
     WORD
    -1.33
     Word
    -1.32
     palabra
    -1.27
     palavra
    -1.27
    Word
    -1.27
    word
    -1.25
     woord
    -0.98
     parola
    -0.97
    WORD
    -0.92
    POSITIVE LOGITS
    atrician
    0.61
    group
    0.60
    smith
    0.59
     caribe
    0.58
    brains
    0.56
     Brasileiro
    0.56
    play
    0.55
    base
    0.54
     endings
    0.54
    stand
    0.53
    Act Density 0.022%

    No Known Activations