INDEX
Explanations
the word "out" in various contexts
New Auto-Interp
Negative Logits
erties
-0.17
è¡
-0.15
Moy
-0.15
lig
-0.15
cker
-0.15
ination
-0.14
ét
-0.14
variant
-0.14
ijing
-0.13
Scalars
-0.13
POSITIVE LOGITS
there
0.27
out
0.26
there
0.21
Out
0.20
THERE
0.19
There
0.17
.out
0.17
wards
0.17
dere
0.17
LOUD
0.17
Activations Density 0.014%