INDEX
Explanations
specific instances related to decision making or personal experiences
phrases indicating decision-making or consideration processes
New Auto-Interp
Negative Logits
apo
-0.70
ROR
-0.62
Witches
-0.61
DN
-0.59
photos
-0.56
Lago
-0.56
uggle
-0.55
role
-0.55
ombat
-0.55
fixme
-0.55
POSITIVE LOGITS
finally
1.27
settled
1.01
decided
0.99
conclude
0.90
eventually
0.89
reluctantly
0.87
culminating
0.85
recons
0.83
concluded
0.83
decide
0.82
Activations Density 0.360%