INDEX
Explanations
occurrences of significant keywords or phrases indicating various forms of notices, challenges, or constraints in the text
New Auto-Interp
Negative Logits
odox
-0.17
ccc
-0.16
út
-0.15
ghan
-0.15
äm
-0.14
Florence
-0.14
orda
-0.14
opsis
-0.14
CCC
-0.14
conds
-0.14
POSITIVE LOGITS
ono
0.16
610
0.15
dun
0.15
missive
0.15
pit
0.14
ONO
0.14
åı¸
0.14
elin
0.14
pawn
0.14
Ïĥκε
0.14
Activations Density 0.001%