INDEX
Explanations
themes related to societal structure and oppression
New Auto-Interp
Negative Logits
ÑĢана
-0.16
ide
-0.15
thereby
-0.15
æĺ¯ä»Ģä¹Ī
-0.14
-valu
-0.14
orm
-0.14
hvad
-0.14
strerror
-0.14
etri
-0.14
ç©¶
-0.13
POSITIVE LOGITS
where
0.60
where
0.46
où
0.42
donde
0.41
ÏĮÏĢοÏħ
0.40
где
0.39
where
0.39
_where
0.38
Where
0.37
wherein
0.37
Activations Density 0.592%