INDEX
Explanations
words related to damage and harm caused by various factors
New Auto-Interp
Negative Logits
CCR
-0.17
?option
-0.16
Insensitive
-0.16
icine
-0.15
ä»ģ
-0.15
dump
-0.14
BlockSize
-0.14
ä¸ģ缮
-0.14
Dynam
-0.14
icker
-0.14
POSITIVE LOGITS
damage
0.58
harm
0.45
damage
0.44
Damage
0.42
damages
0.42
DAMAGE
0.42
dam
0.40
injury
0.39
amage
0.39
Damage
0.39
Activations Density 0.112%