INDEX
Explanations
references to social media and public statements
New Auto-Interp
Negative Logits
verifyException
-0.65
۲۰۱
-0.57
تضيفلها
-0.56
noDo
-0.51
-0.48
Montague
-0.47
nonUne
-0.45
fifteenth
-0.45
Bradshaw
-0.43
monoxide
-0.42
POSITIVE LOGITS
Wednesday
1.29
Wednesday
1.21
③
1.00
WEDNESDAY
0.99
WEDNESDAY
0.97
③
0.93
mercredi
0.91
wednesday
0.90
Wednesdays
0.89
onsdag
0.89
Activations Density 0.403%