INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     detriment
    -0.11
    azar
    -0.10
    imi
    -0.10
    roz
    -0.09
    iba
    -0.09
     sovere
    -0.09
     CONSTANTS
    -0.09
     eas
    -0.09
    ubar
    -0.09
     staples
    -0.09
    POSITIVE LOGITS
     pointer
    0.09
     major
    0.09
     advoc
    0.09
     termin
    0.09
     rhetoric
    0.09
    ilar
    0.09
     folds
    0.09
     enlisted
    0.09
     prime
    0.08
     barn
    0.08
    Act Density 0.153%

    No Known Activations