INDEX
Explanations
instances where actions are described
phrases related to guidelines and requirements
New Auto-Interp
Negative Logits
veland
-0.63
âĸ¬
-0.60
onest
-0.55
allegiance
-0.54
orthy
-0.52
Franch
-0.52
idency
-0.51
antine
-0.51
chance
-0.51
Canad
-0.51
POSITIVE LOGITS
above
2.08
below
1.83
mentioned
1.76
above
1.71
listed
1.67
outlined
1.61
described
1.56
discussed
1.54
below
1.49
foregoing
1.43
Activations Density 0.474%