INDEX
Explanations
formal statements or declarations
phrases related to formal declarations or assertions
New Auto-Interp
Negative Logits
elsius
-0.81
rys
-0.75
osponsors
-0.72
axy
-0.71
rowd
-0.71
versely
-0.71
bat
-0.70
cffff
-0.70
ricular
-0.69
engeance
-0.66
POSITIVE LOGITS
statements
1.09
statement
0.90
pronoun
0.85
uttered
0.85
gow
0.79
Statements
0.77
ARB
0.76
regarding
0.75
warr
0.75
Statement
0.75
Activations Density 0.026%