INDEX
Explanations
references to predators and prey interactions
New Auto-Interp
Negative Logits
aths
-0.14
pornografia
-0.14
-commercial
-0.14
prostitu
-0.14
ampo
-0.13
Atl
-0.13
dwar
-0.13
ÌĨ
-0.13
/UIKit
-0.13
amine
-0.13
POSITIVE LOGITS
predators
0.28
pred
0.26
predator
0.24
predatory
0.23
Pred
0.22
prey
0.22
Predator
0.21
Pred
0.20
pred
0.19
attacks
0.19
Activations Density 0.083%