INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     chr
    -0.07
     timers
    -0.07
     booze
    -0.06
    	break
    -0.06
     Brom
    -0.06
     scares
    -0.06
     proph
    -0.06
    022
    -0.06
    -no
    -0.06
    shit
    -0.06
    POSITIVE LOGITS
     elegant
    0.20
     elegance
    0.14
     Elegant
    0.13
    legant
    0.10
     eleg
    0.09
     gracefully
    0.09
     elo
    0.08
    0.07
     εργ
    0.07
    agonal
    0.07
    Act Density 0.003%

    No Known Activations