INDEX
Explanations
phrases related to control and power
New Auto-Interp
Negative Logits
enegger
-0.79
ruary
-0.68
Recommend
-0.66
onic
-0.66
jen
-0.63
tein
-0.62
soon
-0.61
uni
-0.61
lyn
-0.60
asus
-0.60
POSITIVE LOGITS
eering
1.05
ership
1.03
eers
0.89
orship
0.83
levers
0.83
controlled
0.82
rador
0.78
eer
0.78
taker
0.77
rollers
0.77
Activations Density 2.764%