INDEX
Explanations
pronouns ('we', 'us', 'our') along with suggestions or directives
references to collective or group actions and opinions
New Auto-Interp
Negative Logits
è£ıè
-0.74
ãĤ´ãĥ³
-0.71
DAY
-0.71
è£ıè¦ļéĨĴ
-0.68
REDACTED
-0.67
¿½
-0.64
Verge
-0.62
practition
-0.60
WARE
-0.60
ForgeModLoader
-0.58
POSITIVE LOGITS
're
1.63
've
1.40
gotta
1.14
'll
1.07
haven
1.06
wanna
1.04
don
1.04
want
0.99
kind
0.95
have
0.93
Activations Density 0.264%