INDEX
    Explanations

    statements and assertions involving "the" and other definite references

    New Auto-Interp
    Negative Logits
    ãĥ³ãĤ¯
    -0.14
    olt
    -0.14
    burg
    -0.14
    rada
    -0.14
    .eng
    -0.13
    бина
    -0.13
     possibility
    -0.13
    ļĮ
    -0.12
    ãĤĩ
    -0.12
    deriv
    -0.12
    POSITIVE LOGITS
     reason
    0.32
     problem
    0.26
     key
    0.22
     Problem
    0.21
     issue
    0.21
    problem
    0.20
     main
    0.20
     thing
    0.20
     answer
    0.19
     trouble
    0.19
    Act Density 0.277%

    No Known Activations