INDEX
Explanations
explicit statements or propositions
references to various types of statements
New Auto-Interp
Negative Logits
elsius
-0.78
rys
-0.76
rowd
-0.72
MpServer
-0.70
rowing
-0.69
itals
-0.68
bid
-0.67
Friend
-0.66
versely
-0.66
axy
-0.66
POSITIVE LOGITS
statements
0.94
statement
0.90
ariat
0.88
pronoun
0.81
uttered
0.79
gow
0.79
ARB
0.77
Statement
0.76
regarding
0.76
Statements
0.74
Activations Density 0.027%