INDEX
Explanations
the word "genre" with high activation values
references to different genres in media
New Auto-Interp
Negative Logits
urion
-0.85
riel
-0.84
Lumpur
-0.80
amen
-0.71
ermanent
-0.69
erald
-0.66
Wilhelm
-0.64
Uz
-0.64
Lama
-0.63
administ
-0.63
POSITIVE LOGITS
genre
0.82
genres
0.81
fiction
0.76
ologies
0.75
genre
0.74
allo
0.73
¥µ
0.71
juices
0.69
adelphia
0.68
ĸļ
0.68
Activations Density 0.018%