INDEX
Explanations
references to film-related terms like movie collections or critical reviews
references to the term "critique" or variations thereof
New Auto-Interp
Negative Logits
velt
-0.76
noon
-0.73
orthy
-0.72
loo
-0.71
Leary
-0.68
gypt
-0.66
pires
-0.65
¿½
-0.65
XT
-0.64
pora
-0.63
POSITIVE LOGITS
erion
1.43
ique
1.19
iques
1.15
icism
1.13
eria
1.05
iqu
1.00
Crit
0.94
osure
0.88
ically
0.86
icals
0.83
Activations Density 0.006%