INDEX
Explanations
mentions of "Art" in various contexts
New Auto-Interp
Negative Logits
erland
-0.23
er
-0.21
arts
-0.19
arts
-0.18
est
-0.17
pants
-0.16
ry
-0.16
estone
-0.15
naire
-0.15
ann
-0.15
POSITIVE LOGITS
istry
0.25
ifice
0.24
illery
0.21
fully
0.18
ificial
0.17
ifacts
0.17
icular
0.17
ikel
0.17
ãĤ¥
0.16
isans
0.16
Activations Density 0.035%