INDEX
Explanations
controversial political topics and statements related to social issues
New Auto-Interp
Negative Logits
anwhile
-0.82
Shutterstock
-0.75
tremend
-0.75
ortium
-0.73
seiz
-0.73
anecd
-0.72
sadly
-0.72
mathemat
-0.70
patched
-0.70
conduc
-0.70
POSITIVE LOGITS
Ĵ
1.05
¡
1.05
ĸ
1.04
ħ
1.02
ĩ
1.02
į
1.02
ĥ
1.01
¬
1.01
ľ
0.99
Ĥ
0.96
Activations Density 0.179%