INDEX
Explanations
references to real-life events, particularly those that involve controversy or misinformation
New Auto-Interp
Negative Logits
soever
-0.75
é¾įå¥ij士
-0.69
belonged
-0.66
hawks
-0.63
moderates
-0.61
wisely
-0.60
enery
-0.59
occupied
-0.59
meanwhile
-0.58
Rossi
-0.58
POSITIVE LOGITS
downfall
0.97
eventual
0.88
ãĥĩãĤ£
0.85
lasting
0.81
breakdown
0.81
conclusion
0.81
demise
0.79
deaths
0.79
paralysis
0.79
deterioration
0.78
Activations Density 0.302%