INDEX
Explanations
queries or requests for assistance regarding tasks or actions
New Auto-Interp
Negative Logits
immel
-0.16
kie
-0.14
noch
-0.14
yleft
-0.14
prec
-0.14
'
-0.13
att
-0.13
‘
-0.13
cyber
-0.13
mah
-0.13
POSITIVE LOGITS
kind
0.27
like
0.22
Kind
0.21
yeah
0.21
kind
0.20
_kind
0.20
Kind
0.19
.kind
0.19
Yeah
0.18
agher
0.18
Activations Density 0.001%