INDEX
Explanations
verbs indicating causation or outcomes
phrases that indicate causation or consequence
New Auto-Interp
Negative Logits
atching
-0.59
entric
-0.58
aves
-0.56
afort
-0.56
arer
-0.55
iling
-0.55
Fram
-0.54
ature
-0.54
inen
-0.53
ZI
-0.52
POSITIVE LOGITS
inex
0.87
nowhere
0.82
ãĥĥãĥī
0.80
gers
0.79
uez
0.75
-+
0.74
inevitably
0.73
wcs
0.71
iments
0.70
us
0.69
Activations Density 0.047%