INDEX
Explanations
mentions of shops and online retail experiences
New Auto-Interp
Negative Logits
auffi
-1.06
myſelf
-1.02
pleaſure
-0.96
Theſe
-0.92
ſche
-0.91
ainfi
-0.91
iſt
-0.90
ſtate
-0.90
eſt
-0.90
greateſt
-0.88
POSITIVE LOGITS
shop
0.76
shop
0.75
Shop
0.70
Shop
0.67
ACCESS
0.62
room
0.62
hop
0.57
.
0.57
exit
0.56
↵↵
0.56
Activations Density 0.085%