INDEX
Explanations
phrases related to taking action or making decisions
New Auto-Interp
Negative Logits
terness
-0.76
çīĪ
-0.69
nis
-0.69
çͰ
-0.66
adr
-0.66
Ĥİ
-0.64
unknown
-0.64
ernels
-0.63
yssey
-0.63
almost
-0.62
POSITIVE LOGITS
anymore
1.23
adequately
1.05
properly
0.99
enough
0.97
sufficiently
0.94
altogether
0.93
adequate
0.88
sufficient
0.86
correctly
0.82
timely
0.80
Activations Density 0.383%