INDEX
Explanations
phrases related to positive feedback or appreciation
positive expressions of opinions or preferences
New Auto-Interp
Negative Logits
iliated
-0.66
å°Ĩ
-0.65
probable
-0.63
Registered
-0.60
interrupted
-0.59
iped
-0.58
WARE
-0.58
lethal
-0.57
bia
-0.57
udder
-0.56
POSITIVE LOGITS
because
0.94
tho
0.91
bec
0.77
!!!!
0.75
though
0.73
cause
0.72
uncond
0.71
!!!!!!!!
0.70
!!!
0.70
:-)
0.68
Activations Density 0.613%