INDEX
Explanations
references to humor and comedic elements in discussions about film and characters
New Auto-Interp
Negative Logits
pector
-0.15
оÑģÑĮ
-0.15
á»ijng
-0.15
spraying
-0.14
ãĥ³ãĤ°ãĥ«
-0.14
atatype
-0.14
dispers
-0.14
ocator
-0.14
elson
-0.14
lsen
-0.13
POSITIVE LOGITS
Ser
0.29
Fire
0.29
Shepherd
0.28
Fire
0.22
Jay
0.21
River
0.21
Shepard
0.20
Alliance
0.20
Wash
0.20
Mal
0.20
Activations Density 0.009%