INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     greedy
    -0.07
    _filt
    -0.07
     Ming
    -0.07
     більш
    -0.07
    _chain
    -0.07
     month
    -0.07
     steam
    -0.06
     товарів
    -0.06
    ’
    -0.06
     Simmons
    -0.06
    POSITIVE LOGITS
     father
    0.16
     Father
    0.14
     dad
    0.13
     Dad
    0.12
    Father
    0.11
     fathers
    0.10
     dads
    0.09
     Fathers
    0.09
     grandfather
    0.09
    father
    0.09
    Act Density 0.017%

    No Known Activations