INDEX
Explanations
dates and locations expressed as cardinal numbers
negative connotations or mentions of conflict
New Auto-Interp
Negative Logits
advertisement
-1.10
ÂŃ
-1.03
Advertisements
-0.84
risome
-0.77
â̳
-0.74
â̦"
-0.74
â̦."
-0.71
â̲
-0.69
â̦â̦â̦â̦â̦â̦â̦â̦
-0.69
ADVERTISEMENT
-0.69
POSITIVE LOGITS
-
2.98
--
1.61
�
1.56
---
1.41
--
1.35
–
1.25
"-
1.25
(-
1.22
+
1.21
-->
1.19
Activations Density 0.061%