INDEX
    Explanations

    numeric values and percentages within text

    New Auto-Interp
    Negative Logits
    eg
    -0.16
    opper
    -0.15
    esp
    -0.15
    دÙĪØ¯
    -0.14
     whose
    -0.14
    whose
    -0.14
    lore
    -0.14
     totaling
    -0.14
    áh
    -0.14
    chter
    -0.13
    POSITIVE LOGITS
     compared
    0.24
     far
    0.22
    far
    0.20
     equivalent
    0.19
    ãģĨãģ¡
    0.19
    urette
    0.19
     down
    0.18
     enough
    0.18
    represent
    0.17
     equ
    0.17
    Act Density 0.106%

    No Known Activations