INDEX
Explanations
instances of emotional expressions and physical actions related to family dynamics
New Auto-Interp
Negative Logits
ney
-0.17
ãĥĭãĥ¼
-0.17
ãĤ«ãĥ«
-0.16
Borders
-0.15
thunder
-0.15
Thunder
-0.15
amam
-0.14
/command
-0.14
/pp
-0.14
ãģ£ãģ
-0.14
POSITIVE LOGITS
icerca
0.15
Hydra
0.15
uren
0.14
privileged
0.14
سازÛĮ
0.14
кÑĥлÑı
0.14
ÏĦ
0.14
avir
0.13
ëĪĦ
0.13
357
0.13
Activations Density 0.000%