INDEX
Explanations
phrases related to outcomes or results
phrases indicating potential outcomes or consequences
New Auto-Interp
Negative Logits
clips
-0.65
ashes
-0.62
vati
-0.61
verages
-0.57
ahead
-0.57
discussed
-0.57
lund
-0.57
ida
-0.56
noticed
-0.55
recounted
-0.55
POSITIVE LOGITS
be
1.49
Be
0.97
be
0.94
contain
0.93
resemble
0.90
asted
0.88
BE
0.86
belong
0.86
have
0.86
consist
0.85
Activations Density 0.109%