INDEX
Explanations
frequent nouns and pronouns, suggesting a focus on identifying relationships between subjects and their actions or characteristics
New Auto-Interp
Negative Logits
tight
-0.15
atos
-0.15
imeo
-0.14
antim
-0.14
chos
-0.14
innocent
-0.14
anno
-0.14
нг
-0.14
Seymour
-0.14
/Foundation
-0.14
POSITIVE LOGITS
ibil
0.16
ird
0.16
isson
0.15
Equality
0.14
ilton
0.14
aggi
0.14
svens
0.14
.pix
0.14
çĿ
0.14
çĿ
0.14
Activations Density 0.030%