INDEX
Explanations
phrases that indicate instructions or actions
New Auto-Interp
Negative Logits
sky
-0.15
nt
-0.14
mente
-0.14
nection
-0.13
Question
-0.13
where
-0.13
Choice
-0.13
_FOUND
-0.13
ane
-0.13
ize
-0.13
POSITIVE LOGITS
ptal
0.21
ekim
0.18
adil
0.18
learn
0.17
ombs
0.17
xico
0.17
accom
0.16
further
0.16
Äł
0.16
complement
0.15
Activations Density 0.042%