INDEX
Explanations
negatively affects or opposes
New Auto-Interp
Negative Logits
Okay
0.40
Better
0.39
Absence
0.38
Add
0.35
okay
0.34
absence
0.34
X
0.33
Te
0.33
Partial
0.33
जुड़
0.33
POSITIVE LOGITS
undermines
0.50
正常的
0.49
undermine
0.46
unnecessarily
0.46
sanctity
0.46
needlessly
0.46
undermining
0.43
原本
0.43
précieux
0.42
innocence
0.42
Activations Density 0.268%