INDEX
Explanations
phrases related to causes of events or conditions
New Auto-Interp
Negative Logits
owship
-0.75
rim
-0.74
isse
-0.70
lite
-0.70
rons
-0.69
atri
-0.69
agog
-0.69
apest
-0.69
rooft
-0.68
reet
-0.68
POSITIVE LOGITS
why
0.86
attribut
0.86
why
0.81
blamed
0.81
causing
0.80
thereof
0.80
ality
0.78
affecting
0.77
WHY
0.77
Causes
0.77
Activations Density 0.078%