INDEX
Explanations
names related to a specific popular animated movie franchise
references to specific animated films and franchises
New Auto-Interp
Negative Logits
̶
-0.78
terday
-0.71
OPLE
-0.66
uberty
-0.65
odore
-0.64
udence
-0.64
Sapphire
-0.60
FUL
-0.60
Sinclair
-0.60
Jav
-0.60
POSITIVE LOGITS
Gund
1.32
elta
1.05
emonium
0.89
anium
0.88
otal
0.84
ecided
0.84
ersen
0.82
proport
0.79
ļé
0.77
nodd
0.77
Activations Density 0.005%