INDEX
Explanations
phrases indicating avoidance or restriction
New Auto-Interp
Negative Logits
ç«
-0.15
.clock
-0.14
enders
-0.14
etz
-0.14
úa
-0.13
fov
-0.13
ijd
-0.13
Keyboard
-0.13
riterion
-0.13
rios
-0.13
POSITIVE LOGITS
otron
0.16
ạnh
0.15
Diagram
0.15
tom
0.14
facto
0.14
ãĥķãĥĪ
0.14
μÎŃ
0.14
ãĥ¼ãĥij
0.14
tach
0.14
esser
0.14
Activations Density 0.173%