INDEX
    Explanations

    phrases or structures that describe examples or clarifications

    New Auto-Interp
    Negative Logits
     purpoſe
    -0.77
     ſub
    -0.67
     Juifs
    -0.63
     ſen
    -0.61
     Diſ
    -0.61
     ſmall
    -0.61
     greateſt
    -0.61
     houſe
    -0.60
     Anſ
    -0.59
    "]);
    
    -0.58
    POSITIVE LOGITS
     например
    1.01
    例えば
    1.01
     like
    0.99
     bijvoorbeeld
    0.98
     např
    0.96
    比如
    0.95
    voorbeeld
    0.94
    like
    0.93
     beispielsweise
    0.93
    たとえば
    0.93
    Act Density 0.923%

    No Known Activations