INDEX
Explanations
arguments related to legal defenses and claims
New Auto-Interp
Negative Logits
Doll
-0.20
doll
-0.18
Dag
-0.18
Dirt
-0.18
dolls
-0.18
dispatcher
-0.17
Dortmund
-0.17
dishwasher
-0.17
ippi
-0.17
Dawson
-0.17
POSITIVE LOGITS
defense
0.74
defence
0.70
-defense
0.61
defense
0.60
Defense
0.59
Defence
0.55
defend
0.54
defenses
0.54
Defense
0.53
def
0.50
Activations Density 0.172%