INDEX
Explanations
action phrases
the starts of new turns or major discourse events — tokens that mark a shift into a new question, instruction, or important content.
New Auto-Interp
Negative Logits
"}}
0.48
cotid
0.43
度假
0.43
cotidiana
0.42
NGTH
0.41
pengaturan
0.41
графика
0.41
наў
0.41
"="
0.40
⃘
0.40
POSITIVE LOGITS
cardinals
0.43
during
0.42
humans
0.41
interest
0.40
dar
0.39
patents
0.39
threats
0.38
endpoints
0.38
enne
0.38
Threats
0.38
Activations Density 0.010%