INDEX
    Explanations

    sections of text with high numerical values or counts

    New Auto-Interp
    Negative Logits
    Datuak
    -0.93
    lotz
    -0.84
     ویکی‌پدیای
    -0.82
    rsiniz
    -0.81
    tershire
    -0.81
    外部リンク
    -0.81
    imarães
    -0.81
    достатки
    -0.80
     "<?
    -0.76
     McIl
    -0.74
    POSITIVE LOGITS
    s
    0.84
    [toxicity=0]
    0.80
    WebVitals
    0.75
    o
    0.66
     Denk
    0.66
    0.65
    er
    0.64
    intios
    0.64
    peper
    0.62
    ียม
    0.60
    Act Density 0.043%

    No Known Activations