INDEX
Explanations
words related to consistency and being consistent with rules or values
references to consistency in policies, practices, or behaviors
New Auto-Interp
Negative Logits
doors
-0.82
olit
-0.79
worms
-0.75
mong
-0.75
crow
-0.73
doms
-0.72
thur
-0.71
stals
-0.70
tu
-0.70
rection
-0.69
POSITIVE LOGITS
consistency
0.88
itarian
0.88
ibilities
0.83
offender
0.82
iated
0.81
ually
0.79
ibly
0.78
icut
0.78
aneously
0.77
ively
0.77
Activations Density 0.033%