INDEX
Explanations
references to divisions or categorizations within a text
New Auto-Interp
Negative Logits
dar
-0.97
someone
-0.83
illard
-0.77
aunt
-0.70
times
-0.68
awoken
-0.66
ante
-0.66
aston
-0.66
iron
-0.66
IRO
-0.64
POSITIVE LOGITS
halves
1.00
categories
0.95
chronological
0.94
factions
0.89
manageable
0.87
phases
0.87
groups
0.83
separate
0.83
tiers
0.83
distinct
0.81
Activations Density 0.063%