INDEX
    Explanations

    references to numerical values and quantities

    New Auto-Interp
    Negative Logits
    ä¼į
    -0.19
    nings
    -0.17
    jes
    -0.16
    iva
    -0.15
    jar
    -0.15
    iche
    -0.15
    ä¹İ
    -0.15
    eds
    -0.15
    ees
    -0.14
    istant
    -0.14
    POSITIVE LOGITS
     Shades
    0.22
    ity
    0.20
    alog
    0.17
    ëĭ¤
    0.17
    ities
    0.17
    th
    0.16
     shades
    0.16
    SSION
    0.16
    à¸ģ
    0.15
    ones
    0.15
    Act Density 0.178%

    No Known Activations