INDEX
Explanations
phrases related to various different categories or concepts, potentially encompassing a range of subjects from social issues to physical objects
references to various political and social ideologies, as well as groups and their associated characteristics
New Auto-Interp
Negative Logits
ãĥį
-0.67
confir
-0.66
ËĪ
-0.64
20439
-0.60
ãĥ«
-0.58
é¾įå
-0.58
alloween
-0.58
Ö
-0.58
quartered
-0.56
kefeller
-0.56
POSITIVE LOGITS
etc
1.68
,...
1.27
etc
1.25
â̦)
1.19
...)
1.13
â̦
1.04
,
0.99
â̦
0.99
ect
0.98
,,,,
0.93
Activations Density 0.473%