INDEX
Explanations
phrases related to politics
punctuation marks, particularly periods
New Auto-Interp
Negative Logits
advis
-0.71
dotted
-0.67
applicable
-0.63
theoret
-0.60
chopping
-0.60
confidentiality
-0.60
wise
-0.59
olation
-0.58
imposition
-0.58
iture
-0.58
POSITIVE LOGITS
Va
1.03
Skill
0.98
E
0.91
O
0.90
A
0.90
J
0.88
AX
0.87
Ire
0.82
C
0.82
S
0.80
Activations Density 0.070%