INDEX
Explanations
occurrences of the word "on"
New Auto-Interp
Negative Logits
iegel
-0.16
alian
-0.15
åĤ
-0.15
TagName
-0.14
131
-0.14
esser
-0.14
reetings
-0.13
acky
-0.13
u
-0.13
lax
-0.13
POSITIVE LOGITS
ainer
0.18
azon
0.18
lev
0.16
atrix
0.16
OTHERWISE
0.15
egal
0.15
arget
0.15
phan
0.15
BATCH
0.14
rames
0.14
Activations Density 0.004%