INDEX
Explanations
words related to cooperation and shared goals
New Auto-Interp
Negative Logits
suppress
-0.16
trs
-0.15
aria
-0.15
ienne
-0.15
LOY
-0.15
ead
-0.15
ivery
-0.15
abad
-0.14
abb
-0.14
ród
-0.14
POSITIVE LOGITS
being
0.31
getting
0.23
having
0.22
being
0.21
making
0.20
becoming
0.20
doing
0.19
ÏİνÏĦαÏĤ
0.19
Being
0.18
correl
0.18
Activations Density 0.435%