INDEX
Explanations
references to burgers and related food items
New Auto-Interp
Negative Logits
ourn
-0.16
ns
-0.14
avage
-0.14
ynet
-0.14
ref
-0.14
437
-0.14
Shard
-0.14
218
-0.13
fare
-0.13
verture
-0.13
POSITIVE LOGITS
ëŁī
0.18
.gdx
0.16
ãĥ³ãĥĨãĤ£
0.15
ارد
0.15
entai
0.15
UTO
0.15
edom
0.15
DATED
0.14
Arms
0.14
illos
0.14
Activations Density 0.015%