INDEX
Explanations
references to legal or political content, specifically related to legislation, government actions, and societal issues
New Auto-Interp
Negative Logits
oids
-0.65
ATK
-0.62
Boat
-0.61
cpp
-0.60
voy
-0.59
Bee
-0.59
oppers
-0.58
clave
-0.58
teness
-0.58
Brewer
-0.57
POSITIVE LOGITS
structured
0.79
interact
0.79
behave
0.78
unfolded
0.74
shaping
0.72
pport
0.71
fared
0.69
differs
0.69
stood
0.69
proced
0.69
Activations Density 19.928%