INDEX
Explanations
phrases related to controversial or disputed topics or issues
instances of particular negative or derogatory terms
New Auto-Interp
Negative Logits
faded
-0.73
è£ħ
-0.72
collided
-0.70
slumped
-0.70
swept
-0.69
passed
-0.69
patched
-0.67
scatter
-0.66
missed
-0.66
transferred
-0.65
POSITIVE LOGITS
asking
0.85
§
0.84
âĢł
0.81
âĹ¼
0.77
his
0.76
âĢ¢âĢ¢
0.76
mental
0.75
abo
0.75
ISIS
0.74
Ibid
0.74
Activations Density 0.305%