INDEX
    Explanations

    phrases indicating comparisons or evaluations of things or ideas

    New Auto-Interp
    Negative Logits
     itself
    -0.28
    was
    -0.21
    å®ĥ
    -0.20
    çļĦä¸Ģ个
    -0.19
     was
    -0.18
     its
    -0.17
     Its
    -0.16
     wasn
    -0.15
    Its
    -0.15
     оно
    -0.15
    POSITIVE LOGITS
     themselves
    0.45
     ones
    0.38
     examples
    0.32
     those
    0.31
    nt
    0.30
     are
    0.30
     exceptions
    0.29
     originals
    0.28
     reminders
    0.28
     favorites
    0.28
    Act Density 0.487%

    No Known Activations