INDEX
Explanations
phrases related to judgment or consequences being imposed on individuals
phrases and terms related to being judged, labeled, or held accountable
New Auto-Interp
Negative Logits
integration
-0.73
convergence
-0.73
Stability
-0.68
Integration
-0.67
succession
-0.65
Reconstruction
-0.64
divergence
-0.64
Confeder
-0.63
unification
-0.63
formation
-0.62
POSITIVE LOGITS
aback
0.97
ardless
0.79
anked
0.77
inged
0.77
ifferent
0.76
angering
0.75
ãĤ¼
0.75
ptin
0.75
orthy
0.74
otyp
0.73
Activations Density 0.331%