INDEX
Explanations
phrases indicating progress or completion of a task
New Auto-Interp
Negative Logits
lus
-0.16
NCY
-0.15
ë°ķ
-0.15
èĻİ
-0.15
dorf
-0.14
άβ
-0.14
anness
-0.14
ipsis
-0.14
Composite
-0.14
AWN
-0.14
POSITIVE LOGITS
spare
0.23
go
0.21
go
0.20
worry
0.18
ermo
0.18
(go
0.17
Go
0.17
-go
0.17
Go
0.16
ercul
0.16
Activations Density 0.044%