INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     concern
    -2.89
     concerned
    -2.58
    concern
    -2.58
     concerns
    -2.55
     Concern
    -2.47
    concerned
    -2.44
    Concern
    -2.39
     Concerns
    -2.34
    concerns
    -2.34
    Concerns
    -2.25
    POSITIVE LOGITS
     about
    0.57
     ویکی‌پدیا
    0.55
    PRNewswire
    0.54
     over
    0.53
    lessly
    0.51
    utriche
    0.51
     nakalista
    0.50
    fully
    0.49
    Hochspringen
    0.49
    ality
    0.48
    Act Density 0.020%

    No Known Activations