INDEX
    Explanations

    okay, followed by comma

    New Auto-Interp
    Negative Logits
     clarification
    0.98
     clarified
    0.93
     asymmetry
    0.86
    clar
    0.86
     replied
    0.83
     clarifies
    0.82
     clarifying
    0.81
    plain
    0.80
    filenames
    0.78
     imperfections
    0.76
    POSITIVE LOGITS
     we
    1.18
     We
    1.13
     WE
    1.12
     мы
    1.11
    we
    1.10
     Chúng
    1.09
     ہم
    1.06
    我們要
    1.05
     আমরা
    1.04
    We
    1.04
    Act Density 0.127%

    No Known Activations