INDEX
Explanations
phrases related to control and power dynamics
New Auto-Interp
Negative Logits
overlap
-0.14
ãĥ¬ãĥĥãĥĪ
-0.14
ond
-0.14
yat
-0.14
grave
-0.13
onders
-0.13
ioned
-0.13
antages
-0.13
192
-0.13
odynam
-0.13
POSITIVE LOGITS
destiny
0.43
fate
0.36
destin
0.36
Destiny
0.32
Fate
0.30
destino
0.29
affairs
0.27
decisions
0.27
direction
0.26
outcome
0.26
Activations Density 0.186%