INDEX
    Explanations

    describing extent or usage

    New Auto-Interp
    Negative Logits
     oftentimes
    0.51
    管控
    0.48
    ഷണ
    0.45
     каждой
    0.45
    づくり
    0.44
    umuza
    0.43
    每一個
    0.43
     এটির
    0.42
    0.42
     முழுவதும்
    0.42
    POSITIVE LOGITS
     was
    0.49
     did
    0.42
     underlines
    0.40
     went
    0.40
    sset
    0.39
     advice
    0.39
     stung
    0.39
     travelled
    0.38
     suffered
    0.38
    0.38
    Act Density 0.001%

    No Known Activations