INDEX
Explanations
mentions of the word "Sw" followed by a single character and then a number
mentions of the word "Sw" followed by various endings, indicating a focus on specific entities or brands
New Auto-Interp
Negative Logits
OPLE
-0.77
EStream
-0.72
BILITY
-0.70
cised
-0.70
66666666
-0.67
theater
-0.66
preferential
-0.66
vre
-0.66
partial
-0.65
代
-0.65
POSITIVE LOGITS
immer
1.14
inton
1.13
imming
1.13
addle
1.11
ifty
1.11
artz
1.10
indle
1.07
ollen
1.06
allow
1.05
inging
1.04
Activations Density 0.007%