INDEX
    Explanations

    referring or mentioning

    New Auto-Interp
    Negative Logits
    çľ¼æ³ª
    -0.30
    ipse
    -0.29
    ancy
    -0.28
    aji
    -0.28
    hip
    -0.27
    å¾Ĥ
    -0.27
    Jess
    -0.27
    ographs
    -0.26
     Burl
    -0.26
    lege
    -0.26
    POSITIVE LOGITS
    å̼å¾Ĺ注æĦı
    0.27
    çļĦéĩįè¦ģ
    0.27
     another
    0.27
    Fact
    0.27
     notable
    0.26
    è¾ĥé«ĺ
    0.26
    å̼å¾Ĺä¸ĢæıIJ
    0.26
    滤
    0.25
    æľīåĪ©
    0.25
     fact
    0.25
    Act Density 0.011%

    No Known Activations