INDEX
    Explanations

    concepts related to justice and fairness in social issues

    Text preceding end quotation marks

    end of sentence punctuation

    New Auto-Interp
    Negative Logits
    '),
    
    -1.80
    '],
    
    -1.80
    "],
    
    -1.76
    '])
    
    -1.74
    ']);
    
    -1.72
    "])
    
    -1.68
    "),
    
    -1.68
    '))
    
    -1.67
    '):
    
    -1.67
    "]);
    
    -1.62
    POSITIVE LOGITS
    ."
    0.87
    0.77
    .”
    0.73
    .</
    0.62
    .)
    0.58
    0.52
    .,
    0.49
    ั้ง
    0.46
    ↵↵↵
    0.45
    ↵↵
    0.45
    Act Density 0.230%

    No Known Activations