INDEX
Explanations
phrases indicating causal relationships and important points within a text
New Auto-Interp
Negative Logits
ubits
-0.17
egra
-0.16
readcr
-0.15
доÑĢ
-0.15
ики
-0.14
peare
-0.14
ARGET
-0.14
ÏĢλα
-0.14
trouble
-0.14
alie
-0.14
POSITIVE LOGITS
ÂŃi
0.15
ulares
0.14
jom
0.14
vô
0.14
AUTHORIZED
0.14
Fol
0.13
prompt
0.13
acus
0.13
net
0.13
conclude
0.13
Activations Density 0.295%