INDEX
Explanations
references to specific individuals, particularly actors and historical figures linked to the entertainment industry
New Auto-Interp
Negative Logits
á»ijt
-0.16
erer
-0.16
MBOL
-0.15
sey
-0.15
arium
-0.14
ulaire
-0.14
uguay
-0.14
äch
-0.14
erot
-0.14
uchs
-0.14
POSITIVE LOGITS
jamin
0.20
utzer
0.19
amins
0.19
volent
0.18
fits
0.18
ifact
0.16
uen
0.16
ITO
0.16
ito
0.15
Ven
0.15
Activations Density 0.019%