INDEX
    Explanations

    phrases related to social commentary on historical and contemporary issues

    New Auto-Interp
    Negative Logits
    :✨
    -0.64
    ðsíða
    -0.60
     myſelf
    -0.56
    arangay
    -0.56
     ainfi
    -0.55
     незавершена
    -0.54
     avoient
    -0.54
    <unused14>
    -0.53
    [@BOS@]
    -0.53
    <pad>
    -0.53
    POSITIVE LOGITS
     equivalent
    0.48
     substitutes
    0.46
     substitute
    0.44
     zamiast
    0.42
     instead
    0.41
    Instead
    0.41
    instead
    0.40
     замі
    0.40
     replaces
    0.39
     substitution
    0.39
    Act Density 0.132%

    No Known Activations