INDEX
Explanations
positive actions or attributes
phrases that indicate permission, opportunity, or flexibility
New Auto-Interp
Negative Logits
Cth
-0.64
merce
-0.60
ittens
-0.59
ale
-0.57
Epidem
-0.56
Pipeline
-0.56
anish
-0.56
Zimbabwe
-0.56
seams
-0.56
Consortium
-0.55
POSITIVE LOGITS
thood
0.83
choice
0.74
opportunity
0.70
chance
0.70
ãĥİ
0.68
license
0.68
ãĥĸ
0.65
ppo
0.61
freedom
0.61
vik
0.60
Activations Density 0.415%