INDEX
    Explanations

    numerical values or references to statistics and quantities

    New Auto-Interp
    Negative Logits
     favors
    -0.19
     honor
    -0.18
     honorable
    -0.17
     honors
    -0.17
     molding
    -0.17
     Harbor
    -0.17
    avior
    -0.16
     honoring
    -0.16
     theater
    -0.16
     colorful
    -0.16
    POSITIVE LOGITS
    sdale
    0.15
    é£
    0.15
    à¸¸à¸Ľ
    0.14
    éĥ
    0.14
    croft
    0.14
     |_|
    0.14
     headline
    0.14
    ben
    0.14
    HEMA
    0.14
    .exc
    0.14
    Act Density 0.125%

    No Known Activations