INDEX
Explanations
words related to actions or concepts that involve deterring, regulating, or deciding on something
words related to prevention or deterrence
New Auto-Interp
Negative Logits
appropri
-0.60
sucks
-0.58
viz
-0.57
HAL
-0.56
coworkers
-0.55
Haz
-0.55
Posts
-0.54
quirks
-0.54
subordinates
-0.54
seams
-0.54
POSITIVE LOGITS
red
1.67
ered
1.58
ring
1.52
rer
1.49
mented
1.44
ted
1.39
ering
1.37
ance
1.37
anced
1.35
ant
1.34
Activations Density 0.439%