INDEX
Explanations
significant mentions of identity and cultural themes related to race and history
New Auto-Interp
Negative Logits
èª
-0.17
llum
-0.16
ackbar
-0.15
ãĥªãĥ¼ãĤº
-0.15
رÙī
-0.15
tm
-0.15
Torch
-0.15
Rod
-0.14
ÐłÐ¾Ð´
-0.14
leigh
-0.14
POSITIVE LOGITS
Cel
0.29
Morrison
0.24
Cel
0.24
Sofia
0.22
Corinthians
0.22
blues
0.21
Ce
0.20
Mister
0.20
Song
0.20
cel
0.18
Activations Density 0.002%