INDEX
Explanations
sentences that indicate approval or consent in research contexts
New Auto-Interp
Negative Logits
autorytatywna
-0.54
so
-0.53
something
-0.52
Something
-0.51
simply
-0.51
Something
-0.51
@"/
-0.50
qualcosa
-0.49
quite
-0.48
hea
-0.47
POSITIVE LOGITS
CloseOperation
0.82
qrstuvwxyz
0.75
ſelves
0.75
aarrggbb
0.74
صوتيه
0.68
ſelf
0.68
)');
0.68
NOPQRST
0.66
themſelves
0.66
Kessel
0.65
Activations Density 0.723%