INDEX
Explanations
phrases that denote cause and effect or purpose
New Auto-Interp
Negative Logits
malink
-0.17
withStyles
-0.16
ibold
-0.15
umba
-0.15
ician
-0.14
ithub
-0.14
lak
-0.14
oward
-0.14
oust
-0.14
sonian
-0.14
POSITIVE LOGITS
chance
0.17
chances
0.17
Chance
0.16
Chance
0.15
ento
0.14
deal
0.14
omi
0.13
aven
0.13
piration
0.13
caus
0.13
Activations Density 0.127%