INDEX
Explanations
comparisons between different concepts or entities
comparative phrases that highlight moral or ethical considerations
New Auto-Interp
Negative Logits
icken
-0.72
eds
-0.70
Bern
-0.67
âĢIJ
-0.66
encers
-0.65
Domain
-0.62
medium
-0.59
hiba
-0.59
bags
-0.58
engine
-0.58
POSITIVE LOGITS
slapping
0.75
having
0.73
brute
0.71
assass
0.71
abol
0.70
anything
0.69
removing
0.68
rewriting
0.68
blasphemy
0.68
curing
0.68
Activations Density 0.199%