INDEX
Explanations
references to authorship and author-related concepts
New Auto-Interp
Negative Logits
ary
-0.19
elyn
-0.16
oyo
-0.16
bral
-0.16
ayers
-0.16
emens
-0.15
yor
-0.15
ãģ°
-0.15
weg
-0.15
edd
-0.14
POSITIVE LOGITS
ship
0.36
itative
0.35
itarian
0.30
izations
0.27
ing
0.26
izes
0.24
ised
0.24
itat
0.23
ial
0.23
SHIP
0.22
Activations Density 0.030%