INDEX
Explanations
references to groups of people or audiences
New Auto-Interp
Negative Logits
ï¸ı
-0.22
zelf
-0.18
utom
-0.16
/do
-0.16
offee
-0.15
ity
-0.15
lop
-0.15
clado
-0.14
ce
-0.14
enny
-0.14
POSITIVE LOGITS
ourced
0.27
ourcing
0.24
-control
0.17
ings
0.17
source
0.16
istics
0.16
favorites
0.16
oucher
0.16
796
0.15
favourites
0.15
Activations Density 0.024%