INDEX
Explanations
words related to penalties or punishment
references to penalization or punishment
New Auto-Interp
Negative Logits
aeda
-0.81
through
-0.73
worth
-0.72
quickShipAvailable
-0.70
elf
-0.68
lycer
-0.66
overs
-0.65
afety
-0.64
ynthesis
-0.64
RIS
-0.64
POSITIVE LOGITS
ized
1.21
penal
1.07
ised
0.96
izes
0.93
izing
0.92
ization
0.91
ising
0.81
ize
0.79
eties
0.78
punished
0.76
Activations Density 0.019%