INDEX
Explanations
expressions of personal opinions or preferences in decision-making context
New Auto-Interp
Negative Logits
Biôgrafia
-0.59
femininas
-0.56
milliers
-0.54
constantly
-0.50
Soon
-0.49
aidé
-0.48
ladr
-0.48
centaines
-0.48
meren
-0.48
nemico
-0.47
POSITIVE LOGITS
opt
1.20
opting
1.10
opted
1.10
stick
1.09
sticking
0.99
Opt
0.99
Stick
0.97
opts
0.95
Stick
0.94
Opt
0.91
Activations Density 0.408%