INDEX
Explanations
instances where declarations or statements are made
phrases and terms related to declarations or official statements
New Auto-Interp
Negative Logits
RH
-0.73
alez
-0.70
dayName
-0.68
anuts
-0.68
engers
-0.66
Rh
-0.65
hov
-0.63
umbn
-0.62
Reply
-0.62
Pg
-0.62
POSITIVE LOGITS
bankruptcy
1.07
phas
0.96
allegiance
0.94
unequivocally
0.87
independence
0.86
victory
0.83
aloud
0.80
caliphate
0.80
ance
0.79
martial
0.79
Activations Density 0.061%