INDEX
Explanations
physical actions and violence-related terms
New Auto-Interp
Negative Logits
entities
-0.62
typename
-0.61
Bill
-0.61
Gull
-0.60
兮
-0.57
luri
-0.56
贼
-0.55
cemment
-0.54
C
-0.53
alej
-0.53
POSITIVE LOGITS
strike
1.47
strikes
1.40
striking
1.38
hitting
1.35
Strikes
1.34
Schlag
1.33
Strike
1.30
Strike
1.29
struck
1.27
hit
1.25
Activations Density 0.258%