INDEX
Explanations
positive outcomes or achievements
expressions of quality and value related to actions or outcomes
New Auto-Interp
Negative Logits
anwhile
-0.84
Strauss
-0.68
wx
-0.62
horizont
-0.61
rought
-0.60
makers
-0.60
ioxide
-0.60
Creator
-0.59
urgical
-0.59
ennes
-0.59
POSITIVE LOGITS
sense
0.79
impression
0.77
contributions
0.72
contribution
0.71
dent
0.70
Ò
0.69
commit
0.68
headlines
0.68
strides
0.67
TEXT
0.66
Activations Density 0.116%