INDEX
Explanations
frequent discourse markers or transitional phrases
New Auto-Interp
Negative Logits
hint
-0.16
emouth
-0.15
anca
-0.15
à¹īาà¸ĩ
-0.14
utter
-0.14
icon
-0.14
riz
-0.14
darn
-0.14
hook
-0.14
oga
-0.14
POSITIVE LOGITS
onto
0.37
onto
0.33
enough
0.30
Enough
0.27
Ont
0.26
Ont
0.24
back
0.22
moving
0.21
moving
0.21
Enough
0.21
Activations Density 0.092%