INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     parting
    -0.80
     pudding
    -0.78
     favor
    -0.77
     secret
    -0.70
     dispar
    -0.70
     classified
    -0.70
     landslide
    -0.70
     bunk
    -0.69
     dominate
    -0.69
     friendly
    -0.69
    POSITIVE LOGITS
    It
    1.48
    We
    1.46
    They
    1.44
    There
    1.42
    Because
    1.39
    Especially
    1.37
    Sometimes
    1.36
    I
    1.35
    But
    1.35
    Obviously
    1.34
    Act Density 1.058%

    No Known Activations