INDEX
Explanations
verbs ending in 's' that indicate causation or influence
phrases that indicate causation or effects
New Auto-Interp
Negative Logits
ban
-0.70
thia
-0.67
scrimmage
-0.64
nurs
-0.64
Route
-0.60
---------
-0.59
Ples
-0.59
ILCS
-0.57
bour
-0.56
ASE
-0.56
POSITIVE LOGITS
hift
1.22
sure
0.98
paio
0.84
sense
0.82
enders
0.79
arnaev
0.78
akable
0.77
ÄŁ
0.76
berra
0.74
itives
0.74
Activations Density 0.121%