INDEX
Explanations
titles or mentions of people's names
specific punctuation or separators in text, primarily periods
New Auto-Interp
Negative Logits
advis
-0.83
applicable
-0.71
confidentiality
-0.68
olation
-0.67
heels
-0.66
metadata
-0.65
onies
-0.65
itives
-0.62
chopping
-0.62
edly
-0.61
POSITIVE LOGITS
J
1.34
Va
1.17
O
1.14
A
1.12
C
1.12
E
1.12
L
1.10
P
1.10
M
1.09
R
1.09
Activations Density 0.042%