INDEX
Explanations
phrases related to commands or instructions
negation or conditional phrases in the text
New Auto-Interp
Negative Logits
respectively
-0.71
thereto
-0.62
..."
-0.60
â̦"
-0.59
.","
-0.57
thereof
-0.56
Ïī
-0.53
prest
-0.52
etc
-0.52
ÏĢ
-0.51
POSITIVE LOGITS
resa
1.06
odore
0.94
xiety
0.84
notations
0.69
romeda
0.67
swers
0.67
mosp
0.65
zbollah
0.62
withstanding
0.62
bidden
0.61
Activations Density 0.755%