INDEX
Explanations
the frequency of first-person pronouns and expressions of personal experiences or opinions
New Auto-Interp
Negative Logits
DataURL
-0.15
Rahman
-0.14
972
-0.14
undef
-0.14
="../../../
-0.14
greg
-0.14
ache
-0.13
ãģķãģ¾
-0.13
ÙĨدا
-0.13
okable
-0.13
POSITIVE LOGITS
def
0.20
seen
0.19
rarity
0.17
barley
0.17
Dont
0.17
Seen
0.15
noticed
0.15
Ñī
0.15
ve
0.15
meant
0.15
Activations Density 0.251%