INDEX
    Explanations

    phrases indicating additional quantity or items

    New Auto-Interp
    Negative Logits
     peg
    -0.14
    prop
    -0.14
    oshi
    -0.14
    coon
    -0.14
    aklı
    -0.13
    iram
    -0.13
    maktan
    -0.13
    base
    -0.13
     tread
    -0.13
    victim
    -0.13
    POSITIVE LOGITS
    (extra
    0.23
     extra
    0.23
    -extra
    0.23
     added
    0.20
     EXTRA
    0.20
    extra
    0.20
    /add
    0.20
     Added
    0.19
    -added
    0.19
    /new
    0.17
    Act Density 0.169%

    No Known Activations