INDEX
Explanations
phrases indicating perception or interpretation
phrases that indicate perception or opinion
New Auto-Interp
Negative Logits
oller
-0.68
atted
-0.67
rolet
-0.67
rol
-0.66
rower
-0.66
atl
-0.66
LINE
-0.64
zens
-0.64
ãĥ¥
-0.63
wordpress
-0.63
POSITIVE LOGITS
pires
0.96
opposed
0.94
pired
0.86
follows
0.85
belonging
0.82
criptions
0.78
synonymous
0.77
pers
0.76
well
0.75
expend
0.75
Activations Density 0.095%