INDEX
Explanations
unique characters or symbols such as "Ċ" or "âĢ" appearing in the text
instances of alarming or significant news events
New Auto-Interp
Negative Logits
oun
-0.94
tremend
-0.85
eleph
-0.83
ò
-0.81
aditional
-0.80
exha
-0.78
occas
-0.76
practition
-0.75
exting
-0.74
councill
-0.73
POSITIVE LOGITS
Í
0.78
̶
0.71
Merit
0.71
rou
0.69
Correct
0.67
rosso
0.66
使
0.66
ãĥ¯ãĥ³
0.66
1016
0.64
³³³
0.63
Activations Density 0.670%