INDEX
Explanations
phrases that indicate a rephrasing or simplification of information
New Auto-Interp
Negative Logits
straints
-0.17
erable
-0.15
zan
-0.15
ÙħØŃ
-0.15
adius
-0.14
straint
-0.14
tring
-0.14
ogue
-0.14
Telescope
-0.14
anou
-0.14
POSITIVE LOGITS
xi
0.15
âĸį
0.15
ph
0.15
arb
0.15
appers
0.14
isco
0.14
лÑİд
0.14
MDB
0.14
phrase
0.14
ropa
0.14
Activations Density 0.167%