INDEX
Explanations
key phrases and indicators of causation or consequence
New Auto-Interp
Negative Logits
isor
-0.16
arl
-0.15
argar
-0.15
онаÑħ
-0.15
Latter
-0.15
339
-0.14
dge
-0.14
ئ
-0.14
261
-0.14
arlar
-0.14
POSITIVE LOGITS
Ed
0.16
ause
0.15
myp
0.15
ed
0.15
up
0.14
aga
0.14
/REC
0.14
Eb
0.14
vari
0.14
Listing
0.14
Activations Density 0.001%