INDEX
Explanations
mentions of literary genres
New Auto-Interp
Negative Logits
urion
-0.86
riel
-0.77
loo
-0.77
Lumpur
-0.76
administ
-0.76
vic
-0.70
amen
-0.69
ilon
-0.66
grad
-0.66
erald
-0.66
POSITIVE LOGITS
¥µ
0.87
fiction
0.86
conventions
0.84
tropes
0.83
genres
0.83
genre
0.82
mash
0.81
ologies
0.80
icity
0.80
genre
0.78
Activations Density 0.032%