INDEX
Explanations
references to film analysis and critique
New Auto-Interp
Negative Logits
олом
-0.17
edom
-0.16
isson
-0.15
Hook
-0.15
Flex
-0.15
urette
-0.14
orc
-0.14
indo
-0.14
iqueta
-0.14
QA
-0.14
POSITIVE LOGITS
Criterion
0.23
Fell
0.23
Criterion
0.21
Persona
0.20
Wel
0.20
Abbas
0.20
Ing
0.18
Bicycle
0.18
Feder
0.18
Citizen
0.18
Activations Density 0.068%