INDEX
Explanations
names or mentions of individuals
the pronoun "I"
New Auto-Interp
Negative Logits
Valkyrie
-0.73
*/(
-0.72
convol
-0.72
Haunted
-0.71
skirts
-0.71
alternate
-0.70
colleg
-0.68
Compass
-0.67
ãĤµãĥ¼ãĥĨãĤ£ãĥ¯ãĥ³
-0.67
mileage
-0.66
POSITIVE LOGITS
ye
1.16
orno
1.12
ogi
1.05
acs
1.05
edi
1.03
oga
0.99
uli
0.99
anni
0.99
imi
0.98
wa
0.97
Activations Density 0.045%