INDEX
Explanations
mentions of actions or circumstances that involve harm, crime, or emotional distress
concepts related to identity and experiences of marginalization
New Auto-Interp
Negative Logits
awarding
-0.69
ranking
-0.68
rg
-0.66
issuer
-0.64
testament
-0.62
authorizing
-0.61
1973
-0.60
1949
-0.60
2002
-0.60
Wen
-0.59
POSITIVE LOGITS
ictionary
0.91
ãĤ¶
0.72
è£ħ
0.69
Yourself
0.69
cribed
0.67
quished
0.67
ãĥīãĥ©
0.67
omorphic
0.67
ãĤµãĥ¼ãĥĨãĤ£ãĥ¯ãĥ³
0.66
thood
0.66
Activations Density 0.269%