INDEX
    Explanations

    numerical data and references

    New Auto-Interp
    Negative Logits
     -↵
    -0.23
     -
    -0.20
     âĢIJ
    -0.20
    ...↵
    -0.19
    ...'↵
    -0.18
    (...)↵
    -0.18
    <strong
    -0.17
    ..."↵
    -0.17
    ...)↵
    -0.17
    âĢij
    -0.16
    POSITIVE LOGITS
     _
    0.30
     esl
    0.30
     essay
    0.29
     Essay
    0.28
     (_
    0.28
     dissertation
    0.24
    --
    0.24
     essays
    0.23
     Dissertation
    0.23
    .--
    0.23
    Act Density 0.031%

    No Known Activations