INDEX
Explanations
phrases that express observation or inquiry
New Auto-Interp
Negative Logits
ä¼ij
-0.16
agas
-0.15
onet
-0.15
ãģĦãĤĭ
-0.14
itches
-0.14
corp
-0.14
seed
-0.14
freeze
-0.14
YSTEM
-0.13
irk
-0.13
POSITIVE LOGITS
if
0.35
whether
0.25
if
0.23
how
0.20
what
0.20
nếu
0.18
еÑģли
0.17
about
0.17
wenn
0.17
_if
0.17
Activations Density 0.023%