INDEX
Explanations
phrases expressing apologies or regret
New Auto-Interp
Negative Logits
san
-0.15
iek
-0.15
bro
-0.15
erne
-0.14
sut
-0.14
ed
-0.14
reh
-0.13
sabotage
-0.13
Settlement
-0.13
/material
-0.13
POSITIVE LOGITS
tard
0.18
inconvenience
0.17
late
0.15
813
0.15
åĨĴ
0.14
tablename
0.14
ä¸įå¾Ĺ
0.14
461
0.14
ptype
0.14
ãģªãģĮãĤī
0.14
Activations Density 0.065%