INDEX
Explanations
phrases related to reasoning or justification
the word "because," indicating causal relationships in statements
New Auto-Interp
Negative Logits
agin
-0.78
yan
-0.75
nin
-0.74
mint
-0.72
wn
-0.72
Gas
-0.69
lem
-0.67
Luc
-0.67
thal
-0.64
ries
-0.64
POSITIVE LOGITS
*/(
0.90
uristic
0.75
proxies
0.74
endment
0.72
they
0.71
ority
0.69
ecause
0.67
akening
0.67
urers
0.65
uras
0.64
Activations Density 0.079%