INDEX
Explanations
phrases indicating a change or development in a positive direction
New Auto-Interp
Negative Logits
Joined
-0.74
odied
-0.74
successors
-0.69
otype
-0.68
predecessors
-0.68
DNA
-0.67
Introduced
-0.65
issance
-0.65
inia
-0.64
lication
-0.62
POSITIVE LOGITS
downhill
0.90
spir
0.79
BELOW
0.77
blurry
0.73
unfolded
0.73
murky
0.72
bleak
0.70
calmed
0.70
fluid
0.70
Thrones
0.70
Activations Density 1.657%