INDEX
Explanations
pronouns and verbs referring to actions performed by individuals
references to specific individuals or characters
New Auto-Interp
Negative Logits
è£ıè
-0.76
DAY
-0.74
âĦ¢:
-0.68
grave
-0.68
Alas
-0.67
ãĥ¤
-0.65
Sao
-0.63
ĵĺ
-0.62
Unicorn
-0.62
ielding
-0.59
POSITIVE LOGITS
zbollah
1.14
'll
1.08
're
1.08
[
1.05
ain
1.04
gotta
0.99
didn
0.97
got
0.97
've
0.94
mathemat
0.94
Activations Density 0.218%