INDEX
Explanations
terminology related to obstacles or hindrances
terms related to restrictions and obstacles
New Auto-Interp
Negative Logits
Stars
-0.73
Values
-0.72
Truth
-0.70
Score
-0.67
Profession
-0.66
Stories
-0.65
headlined
-0.65
Images
-0.63
Featured
-0.62
LV
-0.60
POSITIVE LOGITS
inhib
0.84
restricting
0.83
preventing
0.80
downs
0.79
inhibition
0.78
imposed
0.76
ption
0.76
DOWN
0.75
gap
0.74
halting
0.71
Activations Density 0.175%