INDEX
    Explanations

    instances of quantities, capacities, and limitations in various contexts

    New Auto-Interp
    Negative Logits
    ç»ĻæĪij
    -0.17
     eux
    -0.16
    让æĪij
    -0.16
    æĺ¯æĪij
    -0.16
    yla
    -0.14
    orne
    -0.14
     hatta
    -0.13
    ank
    -0.13
    ï¼ĮçĦ¶åIJİ
    -0.13
     him
    -0.13
    POSITIVE LOGITS
     there
    0.39
     it
    0.29
    there
    0.27
     Ù쨥ÙĨ
    0.26
     they
    0.25
     nobody
    0.25
     we
    0.24
     everything
    0.23
     thì
    0.23
     nothing
    0.22
    Act Density 1.042%

    No Known Activations