INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    boro
    -0.87
     temptation
    -0.72
     homebrew
    -0.71
     tink
    -0.71
     Chandler
    -0.67
     Floyd
    -0.66
     ponies
    -0.66
     Boise
    -0.66
     Austin
    -0.66
     Hydra
    -0.64
    POSITIVE LOGITS
    Moreover
    1.13
    Therefore
    1.09
    ³³³³³³³³³³³³³³³³
    1.06
    Refer
    1.06
    Furthermore
    1.05
    ³³³
    1.04
    However
    1.01
    ³³³³³³³³
    1.01
    ³³³³
    0.94
    Meanwhile
    0.93
    Act Density 0.399%

    No Known Activations