INDEX
Explanations
references to damage or harm
damage and its consequences
New Auto-Interp
Negative Logits
Према
-0.69
rega
-0.48
tet
-0.48
tartalomajánló
-0.48
noirs
-0.46
otho
-0.45
zee
-0.45
/*---
-0.45
Ours
-0.45
Übung
-0.44
POSITIVE LOGITS
damage
1.21
Damage
1.18
DAMAGE
1.11
Damage
1.08
damage
1.08
damages
1.04
Damages
1.01
Dama
0.98
Damaged
0.98
Dama
0.97
Activations Density 0.134%