INDEX
Explanations
references to films and their qualities or attributes
New Auto-Interp
Negative Logits
trône
-0.88
löyty
-0.70
affez
-0.69
récon
-0.68
kysy
-0.68
vétérina
-0.67
ırlı
-0.66
compréhen
-0.66
étrang
-0.66
ANTAGE
-0.66
POSITIVE LOGITS
film
2.17
films
2.03
FILM
1.95
Film
1.94
film
1.86
Film
1.76
Films
1.67
FILM
1.66
films
1.61
Films
1.54
Activations Density 0.042%