INDEX
Explanations
words related to consistency and conformity
terms related to consistency and reliable adherence to standards
New Auto-Interp
Negative Logits
doors
-0.85
olit
-0.77
OTOS
-0.75
rection
-0.74
worms
-0.72
pez
-0.71
crow
-0.71
stals
-0.71
kamp
-0.70
pathy
-0.69
POSITIVE LOGITS
itarian
0.90
consistency
0.83
ially
0.81
offender
0.80
ively
0.80
iated
0.80
ently
0.79
ibly
0.78
ITY
0.77
GoldMagikarp
0.76
Activations Density 0.028%