INDEX
Explanations
phrases indicating causation or reason
phrases that introduce reasoning or justification
New Auto-Interp
Negative Logits
ãĤ©
-0.70
zona
-0.69
LAB
-0.68
Contact
-0.65
istine
-0.63
rador
-0.63
oslav
-0.62
obyl
-0.61
é¾
-0.61
urred
-0.59
POSITIVE LOGITS
give
1.46
example
1.42
instance
1.38
starters
1.29
bidden
1.28
cing
1.26
getting
1.24
cible
1.18
gotten
1.08
reasons
1.04
Activations Density 0.058%