INDEX
Explanations
phrases related to situational causes and effects or logical relationships
phrases that indicate reasoning or rationale
New Auto-Interp
Negative Logits
awaited
-0.80
Rodham
-0.64
abled
-0.62
window
-0.61
Borough
-0.59
DISTRICT
-0.58
Booster
-0.57
Nut
-0.57
realDonaldTrump
-0.57
':
-0.56
POSITIVE LOGITS
Therefore
0.85
cknowled
0.84
analogy
0.82
therefore
0.81
tera
0.77
ichever
0.77
contrast
0.73
societies
0.73
inctions
0.72
entimes
0.72
Activations Density 0.477%