INDEX
    Explanations

    references to scientific citations or bibliographic references

    New Auto-Interp
    Negative Logits
    
    -0.77
    l
    -0.71
    classnames
    -0.66
    Sk
    -0.65
    ms
    -0.65
    Si
    -0.64
    p
    -0.63
    f
    -0.63
    -0.63
    р
    -0.62
    POSITIVE LOGITS
    [@
    1.36
     [@
    1.09
    /@
    0.92
    :@
    0.92
    ("@
    0.90
    >@
    0.87
    ="@
    0.86
     '@
    0.85
    ('@
    0.85
    =@
    0.85
    Act Density 0.720%

    No Known Activations