INDEX
    Explanations

    occurrences of the word "this" in various contexts

    New Auto-Interp
    Negative Logits
    awan
    -0.07
    OLEAN
    -0.07
    æ¡IJ
    -0.06
     Release
    -0.06
    ved
    -0.06
    avor
    -0.06
     Purs
    -0.06
    anyahu
    -0.06
    YM
    -0.06
     Boeh
    -0.06
    POSITIVE LOGITS
    671
    0.07
    677
    0.06
    745
    0.06
    114
    0.06
    919
    0.06
    399
    0.06
    ãĤĵ
    0.06
    918
    0.06
    rog
    0.06
    rix
    0.06
    Act Density 0.011%

    No Known Activations