INDEX
Explanations
first-person pronouns and expressions of uncertainty or speculation
New Auto-Interp
Negative Logits
ãĥ«ãĥķ
-0.14
adx
-0.14
acey
-0.14
@show
-0.14
524
-0.14
èµ·
-0.13
hood
-0.13
ies
-0.13
åŀ
-0.13
IVED
-0.13
POSITIVE LOGITS
don
0.60
don
0.50
Don
0.44
doesn
0.44
dont
0.43
Don
0.43
DON
0.42
ä¸įçŁ¥éģĵ
0.39
Dun
0.36
dun
0.36
Activations Density 0.045%