INDEX
Explanations
phrases related to dependence or reliance on entities or systems
New Auto-Interp
Negative Logits
inez
-0.17
ither
-0.16
.au
-0.15
noc
-0.15
itez
-0.15
orp
-0.15
lement
-0.14
angelo
-0.14
resh
-0.14
izontally
-0.14
POSITIVE LOGITS
heav
0.33
heavily
0.31
heavy
0.31
upon
0.31
heavy
0.29
Heavy
0.28
heavier
0.28
Heavy
0.28
Upon
0.26
Upon
0.26
Activations Density 0.017%