INDEX
Explanations
terms related to legal and procedural contexts, particularly those involving immigration, judicial systems, and socio-political themes
New Auto-Interp
Negative Logits
situation
-0.64
dataset
-0.63
affair
-0.60
propOrder
-0.59
situation
-0.59
aspect
-0.58
thing
-0.57
experiment
-0.55
poem
-0.52
palette
-0.52
POSITIVE LOGITS
coverage
0.52
antism
0.52
legislation
0.52
etiquette
0.51
operations
0.50
ownership
0.50
circles
0.50
gedrag
0.49
standards
0.49
purposes
0.49
Activations Density 2.868%