INDEX
Explanations
statements related to the effectiveness or ineffectiveness of certain actions or policies
arguments related to the effectiveness of various policies or initiatives
New Auto-Interp
Negative Logits
Printed
-0.85
brood
-0.73
handwritten
-0.73
guessing
-0.68
Naked
-0.67
ridicule
-0.67
disbelief
-0.66
Fancy
-0.66
Fn
-0.65
Jere
-0.64
POSITIVE LOGITS
acea
1.31
beneficial
1.21
effic
1.14
allev
1.12
reducing
1.10
sustainable
1.10
deterrent
1.09
alleviate
1.06
deterrence
1.05
boosting
1.04
Activations Density 0.603%