INDEX
Explanations
words related to rules or limitations imposed on a particular group or activity
New Auto-Interp
Negative Logits
mberg
-0.72
past
-0.68
Zeit
-0.68
Generations
-0.67
fortune
-0.65
uni
-0.65
psc
-0.65
nexus
-0.64
vironment
-0.64
ilus
-0.64
POSITIVE LOGITS
imposed
1.10
restricting
1.04
restricts
0.95
restrictions
0.94
prohibiting
0.94
prohibited
0.92
inhib
0.90
restriction
0.87
enforced
0.83
limiting
0.83
Activations Density 0.040%