INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.07
    156
    -0.06
    _INITIAL
    -0.06
     cose
    -0.06
    05
    -0.06
    101
    -0.06
     caffeine
    -0.06
    figcaption
    -0.06
     delimited
    -0.06
     dbs
    -0.06
    POSITIVE LOGITS
     older
    0.19
     Older
    0.12
    ốc
    0.09
     younger
    0.08
     newer
    0.08
     wealthy
    0.07
     warrior
    0.07
     quadr
    0.07
    Loader
    0.07
     tegen
    0.07
    Act Density 0.009%

    No Known Activations