INDEX
    Explanations

    complexity and nuances in discussions about social issues

    New Auto-Interp
    Negative Logits
    ITO
    -0.13
    ITT
    -0.13
    ']!='
    -0.13
    िनà¤ķ
    -0.13
    ósito
    -0.13
    ourg
    -0.13
    ITEM
    -0.13
    ittle
    -0.12
    ernen
    -0.12
    istes
    -0.12
    POSITIVE LOGITS
    å®ĥ
    0.90
     it
    0.89
     оно
    0.86
     its
    0.75
     nó
    0.68
    ï¼Įå®ĥ
    0.68
     воно
    0.62
     Its
    0.60
    Its
    0.59
     It
    0.54
    Act Density 2.670%

    No Known Activations