INDEX
    Explanations

    questions or phrases expressing the extent of emotions or experiences

    New Auto-Interp
    Negative Logits
    nist
    -0.15
    erta
    -0.15
    igli
    -0.15
    åīĽ
    -0.14
     whose
    -0.14
    æľĢä½³
    -0.14
     Ukra
    -0.14
     preferably
    -0.14
    æĺ¯åIJ¦
    -0.13
    imary
    -0.13
    POSITIVE LOGITS
     much
    0.26
    much
    0.24
    Much
    0.21
    itzer
    0.19
     important
    0.19
     Much
    0.19
     wrong
    0.18
     little
    0.18
    atta
    0.17
     lucky
    0.16
    Act Density 0.044%

    No Known Activations