INDEX
Explanations
comparisons between different concepts or entities
comparative phrases or constructions that evaluate differences between entities or concepts
New Auto-Interp
Negative Logits
etsk
-0.69
ista
-0.69
iken
-0.66
assadors
-0.66
istar
-0.65
encers
-0.65
eni
-0.64
Instructor
-0.63
rica
-0.62
rend
-0.61
POSITIVE LOGITS
fingerprints
0.89
homosexuality
0.81
murder
0.77
wildfire
0.76
lihood
0.76
incest
0.76
circumcision
0.75
ours
0.74
any
0.73
alcoholism
0.72
Activations Density 0.258%