INDEX
    Explanations

    references to societal or cultural issues, particularly in the context of power dynamics

    New Auto-Interp
    Negative Logits
     Efq
    -0.81
     незавершена
    -0.72
    ՚
    -0.69
    }")
    
    -0.69
    AddTagHelper
    -0.69
    TypedDataSet
    -0.68
     houſe
    -0.66
    ὸν
    -0.64
     Koy
    -0.64
    ++
    
    -0.63
    POSITIVE LOGITS
    .
    0.84
    ;
    0.62
    ?
    0.58
    !
    0.54
    RegressionTest
    0.53
     oprot
    0.52
    ↵↵↵
    0.52
    aarrggbb
    0.52
    dinga
    0.51
     when
    0.51
    Act Density 0.979%

    No Known Activations