INDEX
    Explanations

    expressions involving comprehension or awareness

    New Auto-Interp
    Negative Logits
    ぶり
    -0.56
    räck
    -0.53
    ぶりの
    -0.51
    Citation
    -0.51
    IGraphics
    -0.48
    onOptions
    -0.48
    fabs
    -0.47
    ifte
    -0.47
     spese
    -0.47
    tığı
    -0.47
    POSITIVE LOGITS
     understand
    4.58
    understand
    4.07
     understands
    4.00
     Understand
    3.98
     understanding
    3.85
     understood
    3.82
    Understand
    3.81
    understanding
    3.55
     Understanding
    3.37
    understood
    3.29
    Act Density 0.075%

    No Known Activations