INDEX
Explanations
references to tables or structured content in text
mentions of tables of contents
New Auto-Interp
Negative Logits
vernment
-0.85
Directorate
-0.71
alez
-0.68
Mehran
-0.67
ovich
-0.65
imal
-0.64
qua
-0.64
adobe
-0.63
chancellor
-0.63
rily
-0.63
POSITIVE LOGITS
cloth
1.62
top
1.04
au
1.03
aux
0.99
tops
0.97
scraps
0.96
manners
0.93
poons
0.93
table
0.91
poon
0.87
Activations Density 0.026%