INDEX
Explanations
positive adjectives describing quality or experience
New Auto-Interp
Negative Logits
autorytatywna
-0.57
Ligações
-0.56
těch
-0.54
démission
-0.49
diejenigen
-0.49
protože
-0.48
ambién
-0.47
those
-0.47
neler
-0.47
förr
-0.47
POSITIVE LOGITS
joint
0.51
bleau
0.47
,
0.47
<<<<<<<<<<<<<<
0.46
entire
0.46
surla
0.46
MainAxisSize
0.44
a
0.44
whole
0.43
face
0.43
Activations Density 0.781%