INDEX
Explanations
phrases related to control or authority
phrases emphasizing the concept of allowing or permitting actions or behaviors
New Auto-Interp
Negative Logits
oppable
-0.83
Languages
-0.72
holiest
-0.69
lihood
-0.67
atana
-0.67
millenn
-0.66
agher
-0.66
querque
-0.63
cumbers
-0.63
edly
-0.63
POSITIVE LOGITS
tered
1.12
tering
0.92
slip
0.87
icia
0.86
enne
0.75
loose
0.75
us
0.73
itia
0.69
go
0.67
me
0.67
Activations Density 0.028%