INDEX
    Explanations

    references to specific instances or examples

    New Auto-Interp
    Negative Logits
     Phry
    -0.69
    httphttps
    -0.64
    writeField
    -0.61
     betweenstory
    -0.61
     shovels
    -0.60
     orch
    -0.58
     sherds
    -0.57
     gddr
    -0.57
     subgoals
    -0.56
    folios
    -0.56
    POSITIVE LOGITS
     particular
    1.09
     thing
    0.85
    particular
    0.85
     kind
    0.81
     stuff
    0.79
     guy
    0.78
     entire
    0.72
     daqui
    0.71
     wonderful
    0.71
     incredible
    0.70
    Act Density 0.415%

    No Known Activations