INDEX
Explanations
phrases related to causality or explanation
causal phrases or expressions indicating reasons for something occurring
New Auto-Interp
Negative Logits
=-=-=-=-=-=-=-=-
-0.80
arest
-0.79
asp
-0.73
arro
-0.71
Sheep
-0.71
chip
-0.69
adel
-0.68
Leone
-0.68
hov
-0.68
ivas
-0.65
POSITIVE LOGITS
diligence
1.16
giving
0.93
itiz
0.75
cancell
0.75
dilig
0.71
gers
0.70
*/(
0.69
llers
0.69
wcs
0.65
)=(
0.65
Activations Density 0.021%