INDEX
Explanations
references to controversial figures and their actions or statements
New Auto-Interp
Negative Logits
-1.31
-1.05
-1.00
-0.99
utilising
-0.99
-0.95
utilised
-0.95
-0.92
––
-0.91
utilise
-0.90
POSITIVE LOGITS
XNUMX
1.50
1.36
̵
1.32
NUMX
1.29
🇧
0.97
⁇
0.93
.;
0.91
և
0.90
».
0.90
.:
0.89
Activations Density 0.054%