INDEX
Explanations
instances of words associated with public or political contexts
New Auto-Interp
Negative Logits
umble
-0.16
ivals
-0.15
alo
-0.15
unta
-0.15
ManagerInterface
-0.15
_manifest
-0.14
ahren
-0.14
avel
-0.14
DirectoryName
-0.14
ope
-0.14
POSITIVE LOGITS
anch
0.21
spark
0.18
ANCH
0.18
agal
0.16
anchor
0.16
urum
0.15
anch
0.15
anchor
0.15
ady
0.15
defgroup
0.15
Activations Density 0.028%