INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    sw
    -0.58
     no
    -0.55
     dating
    -0.55
    forth
    -0.54
     taller
    -0.53
     cons
    -0.52
     office
    -0.51
     Japan
    -0.51
     sent
    -0.50
     long
    -0.50
    POSITIVE LOGITS
    %.
    3.79
    %).
    2.64
    %,
    2.62
    %:
    2.50
    %;
    2.48
    %-
    2.06
    %
    2.03
    %"
    1.90
    %),
    1.85
    %]
    1.79
    Act Density 0.006%

    No Known Activations