INDEX
Explanations
mentions of literature
references to literature
New Auto-Interp
Negative Logits
fty
-0.68
twitch
-0.66
addafi
-0.65
por
-0.64
Maurit
-0.62
Bel
-0.62
adjust
-0.61
Gry
-0.60
ermanent
-0.59
oland
-0.58
POSITIVE LOGITS
literature
0.99
istry
0.77
emis
0.76
DragonMagazine
0.74
Literature
0.74
laureate
0.72
itatively
0.70
RELE
0.70
writ
0.70
reading
0.68
Activations Density 0.015%