INDEX
Explanations
statements expressing opinions or critiques about cartoons
New Auto-Interp
Negative Logits
алÑİ
-0.15
allee
-0.15
ãĥ
-0.15
dementia
-0.14
Tobacco
-0.14
smoker
-0.14
cigarettes
-0.14
XM
-0.14
tobacco
-0.14
Knox
-0.14
POSITIVE LOGITS
She
0.30
Princess
0.26
Sword
0.26
He
0.25
MOT
0.24
Prince
0.23
Ether
0.23
Ske
0.23
Masters
0.23
Castle
0.23
Activations Density 0.008%