INDEX
Explanations
references to TV shows and their characteristics
New Auto-Interp
Negative Logits
movie
-0.78
film
-0.71
movies
-0.69
película
-0.63
filme
-0.61
films
-0.56
film
-0.56
Movie
-0.56
movie
-0.56
電影
-0.55
POSITIVE LOGITS
houſe
0.87
ſtate
0.83
ſche
0.79
pleaſure
0.78
ſtre
0.77
staffel
0.77
raiſ
0.77
purpoſe
0.76
itſelf
0.75
iſt
0.74
Activations Density 0.087%