INDEX
Explanations
gerunds and words related to roles and actions in a structured context
New Auto-Interp
Negative Logits
ings
-0.23
guns
-0.18
ality
-0.16
lt
-0.16
ng
-0.15
zb
-0.15
xs
-0.15
zh
-0.15
oul
-0.15
iw
-0.15
POSITIVE LOGITS
factor
0.26
factors
0.25
redient
0.23
redients
0.22
factor
0.22
Factors
0.21
-factor
0.20
Factor
0.19
_FACTOR
0.18
force
0.18
Activations Density 0.142%