INDEX
Explanations
references to authorship and the act of writing
New Auto-Interp
Negative Logits
ew
-0.17
arts
-0.15
ayers
-0.15
Rak
-0.15
ey
-0.15
-eyed
-0.15
βά
-0.15
eners
-0.15
-haired
-0.15
اتÛĮ
-0.15
POSITIVE LOGITS
ship
0.20
UPPORTED
0.17
itative
0.16
YSTEM
0.16
lient
0.16
ifold
0.15
rego
0.15
ignum
0.15
upported
0.14
ifice
0.14
Activations Density 0.027%