INDEX
Explanations
references to swimming pools and related amenities
New Auto-Interp
Negative Logits
confess
-0.17
dden
-0.15
vens
-0.15
ewart
-0.14
çī©
-0.14
SHIFT
-0.14
Wiki
-0.14
دÙĩÙħ
-0.13
-duty
-0.13
orque
-0.13
POSITIVE LOGITS
erman
0.19
rch
0.17
bed
0.17
swimming
0.16
룬
0.16
pools
0.16
side
0.15
ɵ
0.15
pool
0.15
geh
0.15
Activations Density 0.029%