INDEX
Explanations
words related to increasing or driving up various factors or actions
New Auto-Interp
Negative Logits
Guard
-0.69
eal
-0.66
ftime
-0.66
inis
-0.64
bis
-0.63
sm
-0.62
til
-0.62
idelines
-0.61
ynski
-0.60
cean
-0.59
POSITIVE LOGITS
aside
1.06
away
0.98
forth
0.98
down
0.96
wedge
0.91
upwards
0.90
onward
0.90
upward
0.89
up
0.84
apart
0.83
Activations Density 0.144%