INDEX
Explanations
phrases related to causality or influence
phrases that indicate cause and effect relationships
New Auto-Interp
Negative Logits
arily
-0.78
todd
-0.72
toured
-0.71
ta
-0.70
tes
-0.69
alian
-0.68
tel
-0.66
right
-0.66
ared
-0.66
cared
-0.66
POSITIVE LOGITS
confusion
0.87
confirmation
0.78
speculation
0.77
bloodshed
0.77
dismissal
0.77
extinction
0.76
widespread
0.76
stagnation
0.76
outbreaks
0.75
deterioration
0.73
Activations Density 0.067%