INDEX
Explanations
significant decisions or choices
significant decisions and their implications
New Auto-Interp
Negative Logits
ancies
-0.74
obos
-0.70
iaries
-0.68
roofs
-0.68
phalt
-0.67
iries
-0.67
dimension
-0.66
ravings
-0.66
onyms
-0.66
assies
-0.65
POSITIVE LOGITS
indeed
1.15
considering
1.11
nonetheless
1.08
compared
0.98
nevertheless
0.77
worth
0.72
because
0.71
anyway
0.70
breaker
0.70
unto
0.70
Activations Density 0.260%