INDEX
Explanations
occurrences of the word "self" in various contexts
New Auto-Interp
Negative Logits
sse
-0.17
ture
-0.17
éné
-0.16
oplevel
-0.16
åĻ
-0.14
Nguyên
-0.14
lém
-0.14
Topology
-0.14
agua
-0.14
ristol
-0.14
POSITIVE LOGITS
hoff
0.14
590
0.14
kil
0.14
Aires
0.14
gone
0.14
"display
0.14
911
0.14
h
0.13
zelf
0.13
ă
0.13
Activations Density 0.010%