INDEX
Explanations
references to various forms of abuse
instances of the word "abuse" in various contexts
New Auto-Interp
Negative Logits
travel
-0.78
cil
-0.75
pard
-0.75
izen
-0.72
printed
-0.70
views
-0.69
eday
-0.69
ailed
-0.66
adventurer
-0.66
traveler
-0.66
POSITIVE LOGITS
abuse
0.94
abusing
0.84
abuse
0.81
abused
0.79
perpetrated
0.79
Abuse
0.76
inflicted
0.75
abusers
0.75
ãĥ«
0.74
abuses
0.71
Activations Density 0.034%