INDEX
Explanations
phrases or words related to direct connections or relationships between entities or concepts
phrases explicitly stating direct relationships
New Auto-Interp
Negative Logits
gerald
-0.81
glers
-0.71
mble
-0.69
ittal
-0.66
cautiously
-0.66
thoroughly
-0.65
Daily
-0.64
stal
-0.62
Daily
-0.62
ulton
-0.61
POSITIVE LOGITS
contradicted
0.96
contradicts
0.83
contradict
0.80
ebted
0.79
impacted
0.77
forward
0.77
benefited
0.77
attributable
0.74
observable
0.74
implicated
0.71
Activations Density 0.029%