INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Test
    -0.08
     stat
    -0.08
     Fort
    -0.08
     zat
    -0.07
     tort
    -0.07
    stant
    -0.07
    269
    -0.07
     Pot
    -0.07
     east
    -0.07
     TEST
    -0.07
    POSITIVE LOGITS
     include
    0.16
     includes
    0.16
     included
    0.12
     including
    0.11
     INCLUDE
    0.09
    including
    0.09
    Includes
    0.09
     inclusion
    0.08
    Include
    0.08
     includ
    0.08
    Act Density 0.121%

    No Known Activations