INDEX
Explanations
phrases that convey evaluation or judgment about people or things
New Auto-Interp
Negative Logits
saveiro
-0.64
dilatation
-0.64
lorette
-0.63
extradition
-0.62
atheism
-0.61
ModelExpression
-0.61
Sancho
-0.60
Baptists
-0.60
pector
-0.60
örté
-0.60
POSITIVE LOGITS
“
0.91
a
0.82
"
0.78
‘
0.73
truly
0.71
an
0.69
„
0.62
«
0.60
'
0.59
being
0.58
Activations Density 0.425%