INDEX
Explanations
references to films
New Auto-Interp
Negative Logits
Sever
-0.59
Sever
-0.57
aze
-0.56
moreland
-0.54
severance
-0.53
stanga
-0.52
anzo
-0.52
angler
-0.52
épar
-0.52
Crunch
-0.52
POSITIVE LOGITS
film
1.83
Film
1.68
Film
1.66
film
1.64
FILM
1.52
FILM
1.45
films
1.41
Films
1.22
films
1.18
Films
1.14
Activations Density 0.015%