INDEX
Explanations
references to the Star Wars franchise, particularly its lore and controversies
New Auto-Interp
Negative Logits
isay
-0.17
ĺ
-0.17
istical
-0.13
_CPU
-0.13
stereotype
-0.13
åħ¨åĽ½
-0.13
abox
-0.13
665
-0.13
amo
-0.12
ritt
-0.12
POSITIVE LOGITS
canon
0.48
canonical
0.38
Canon
0.37
Canon
0.36
cannon
0.34
canonical
0.34
Canonical
0.30
Canonical
0.29
continuity
0.28
lore
0.27
Activations Density 0.290%