INDEX
    Explanations

    words that express perception or likelihood

    New Auto-Interp
    Negative Logits
    ught
    -0.17
    sein
    -0.16
    pent
    -0.14
    itol
    -0.14
    ppe
    -0.14
    pot
    -0.14
    ses
    -0.14
    ewidth
    -0.14
    omer
    -0.14
    LETE
    -0.14
    POSITIVE LOGITS
    lessly
    0.17
    ingly
    0.17
    ance
    0.15
     váºŃy
    0.15
    URRENT
    0.15
    razione
    0.14
    ively
    0.13
    alf
    0.13
    417
    0.13
     cref
    0.13
    Act Density 0.045%

    No Known Activations