INDEX
    Explanations

    expressions of comparison or similarity

    New Auto-Interp
    Negative Logits
     Yet
    -0.50
     he
    -0.48
     yet
    -0.48
    -0.46
    ::::::::
    -0.45
     simply
    -0.45
      
    -0.43
     she
    -0.42
    反而
    -0.41
     (
    -0.41
    POSITIVE LOGITS
     HasFactory
    0.94
    wiſe
    0.85
     plupart
    0.83
     many
    0.82
    RectangleBorder
    0.82
     $_"
    0.80
     ſeveral
    0.80
     רבים
    0.79
     myſelf
    0.77
    many
    0.75
    Act Density 0.116%

    No Known Activations