INDEX
Explanations
references to authors and their works, especially books
New Auto-Interp
Negative Logits
une
-0.16
Clip
-0.15
phan
-0.14
partment
-0.14
Biography
-0.14
Shared
-0.13
Zu
-0.13
isolated
-0.12
technicians
-0.12
directors
-0.12
POSITIVE LOGITS
book
0.52
books
0.40
book
0.39
-book
0.38
кни
0.36
libro
0.35
livro
0.34
_book
0.33
书
0.33
(book
0.32
Activations Density 0.226%