INDEX
Explanations
the presence of phrases emphasizing relationships or connections
New Auto-Interp
Negative Logits
buf
-0.15
woord
-0.14
enci
-0.14
anoi
-0.14
511
-0.14
.Emit
-0.13
Ñıж
-0.13
gaard
-0.13
oland
-0.13
Bu
-0.13
POSITIVE LOGITS
bring
1.24
Bring
1.10
brings
1.10
bringing
1.10
bring
1.10
brought
1.09
Bring
1.07
bringing
0.99
Bringing
0.82
rought
0.72
Activations Density 0.006%