INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    onge
    -0.07
    (pref
    -0.07
    -pref
    -0.07
    base
    -0.07
    Arch
    -0.07
    built
    -0.07
     nyere
    -0.07
    pref
    -0.07
    aum
    -0.07
    ******↵
    -0.07
    POSITIVE LOGITS
     imagination
    0.12
     antics
    0.10
     speculation
    0.09
    -haired
    0.09
     Beast
    0.09
    0.09
     wild
    0.08
    0.08
     dữ
    0.08
     unleashed
    0.08
    Act Density 0.009%

    No Known Activations