INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    tal
    -0.77
     ingred
    -0.77
    ibur
    -0.75
    tre
    -0.74
    bol
    -0.70
     ens
    -0.68
     summed
    -0.66
     lodged
    -0.65
    wagon
    -0.64
    $$
    -0.63
    POSITIVE LOGITS
     conjunction
    1.46
     accordance
    1.23
     lieu
    1.22
     theaters
    1.17
     tandem
    1.12
     order
    0.97
     spite
    0.95
     Japan
    0.92
     situ
    0.92
    clus
    0.91
    Act Density 0.214%

    No Known Activations