INDEX
Explanations
casual conversational phrases and transitions
New Auto-Interp
Negative Logits
Alright
-0.15
æ¬
-0.14
ĥn
-0.13
uala
-0.13
einfach
-0.13
tright
-0.13
__).
-0.13
eget
-0.13
ç½
-0.13
imson
-0.13
POSITIVE LOGITS
note
0.19
speaking
0.19
unrelated
0.18
don
0.18
yes
0.17
Speaking
0.17
Speaking
0.17
rex
0.16
did
0.15
btw
0.15
Activations Density 0.135%