INDEX
Explanations
references to film titles and awards
New Auto-Interp
Negative Logits
Fashion
-0.15
ÑĽ
-0.15
esser
-0.15
Leban
-0.14
ISBN
-0.14
porter
-0.14
tura
-0.14
arem
-0.13
iar
-0.13
ÅĻ
-0.13
POSITIVE LOGITS
foi
0.29
te
0.26
era
0.23
estava
0.23
tinha
0.21
itou
0.19
recebe
0.19
fe
0.19
trou
0.19
fic
0.19
Activations Density 0.008%