INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    izador
    -0.28
     liber
    -0.27
    kar
    -0.27
    ucher
    -0.26
    browser
    -0.26
    BAT
    -0.26
    楷
    -0.26
    camp
    -0.25
    reading
    -0.25
    edReader
    -0.25
    POSITIVE LOGITS
    çĿĽ
    0.27
    ç»ĦæĪIJ
    0.26
    lest
    0.25
    çĺ«
    0.25
    éĥ¨
    0.24
    eson
    0.24
    åľ°å¸¦
    0.24
    _TESTS
    0.24
    æŀĦæĪIJ
    0.24
    éĶ¥
    0.23
    Act Density 0.020%

    No Known Activations