INDEX
Explanations
references to consequences and impacts of various events or policies
New Auto-Interp
Negative Logits
557
-0.14
ancest
-0.13
ua
-0.13
azor
-0.13
480
-0.13
_activ
-0.13
sklad
-0.13
struk
-0.13
_mux
-0.13
prest
-0.12
POSITIVE LOGITS
effects
0.45
Effects
0.38
effects
0.38
impact
0.36
Effects
0.36
Impact
0.35
-effects
0.34
impacts
0.34
effect
0.34
impact
0.33
Activations Density 0.232%