INDEX
Explanations
terms related to race and societal issues
racial epithets and slurs
New Auto-Interp
Negative Logits
ButterKnife
-0.39
operacional
-0.38
operational
-0.36
operational
-0.36
emocion
-0.36
vyš
-0.35
esperamos
-0.35
RTLI
-0.34
Concorde
-0.34
transporte
-0.34
POSITIVE LOGITS
betweenstory
0.53
препратки
0.51
0.48
ooga
0.45
resourceCulture
0.44
0.44
addGap
0.43
modb
0.42
szóci
0.42
𝒯
0.42
Activations Density 0.026%