INDEX
Explanations
terms related to science, politics, and social issues
key terms and concepts related to societal structures and dynamics
New Auto-Interp
Negative Logits
local
-0.73
lihood
-0.57
ometown
-0.56
ospital
-0.55
ewitness
-0.55
bia
-0.52
AMD
-0.52
Merit
-0.51
bestos
-0.50
Ward
-0.50
POSITIVE LOGITS
theorist
0.66
jargon
0.62
manuals
0.57
women
0.55
zsche
0.54
experimented
0.54
nowadays
0.53
insofar
0.53
discourse
0.52
manifesto
0.52
Activations Density 0.986%