INDEX
Explanations
individuals associated with specific interests or characteristics
references to fandoms or fan culture
New Auto-Interp
Negative Logits
CVE
-0.63
".[
-0.62
.''.
-0.61
'.
-0.58
aughs
-0.57
ãĥł
-0.57
."[
-0.54
]);
-0.54
ophon
-0.54
%).
-0.53
POSITIVE LOGITS
yourself
1.29
you
1.11
please
1.11
your
1.06
yourselves
1.05
PLEASE
1.05
then
1.02
THEN
1.01
you
0.90
your
0.87
Activations Density 0.402%