INDEX
Explanations
references to a specific fantasy world or franchise
references to a specific fictional place and its inhabitants
New Auto-Interp
Negative Logits
mable
-0.74
thood
-0.72
creen
-0.72
flix
-0.71
thora
-0.71
itude
-0.69
iciary
-0.69
idav
-0.69
lihood
-0.66
achev
-0.65
POSITIVE LOGITS
Reviewed
0.76
rian
0.75
oute
0.72
rosso
0.68
Chronicles
0.67
Worlds
0.67
Beer
0.66
sbm
0.66
din
0.64
Made
0.64
Activations Density 0.034%