INDEX
    Explanations

    instances of high numerical values or significant figures in various contexts

    New Auto-Interp
    Negative Logits
     eux
    -0.17
     ragaz
    -0.14
    ë¹Ļ
    -0.14
    ymes
    -0.14
     нÑĮого
    -0.14
     THEM
    -0.14
    ká
    -0.14
     него
    -0.13
    æĺ¯æĪij
    -0.13
    yas
    -0.13
    POSITIVE LOGITS
     Ù쨥ÙĨ
    0.28
     there
    0.28
     we
    0.25
     it
    0.22
     thì
    0.22
    there
    0.21
     they
    0.20
    we
    0.19
     nobody
    0.18
     nothing
    0.18
    Act Density 0.448%

    No Known Activations