INDEX
Explanations
references to personal relationships and familial connections
New Auto-Interp
Negative Logits
odpowied
-0.16
icism
-0.15
irim
-0.15
iasm
-0.15
icas
-0.15
indow
-0.14
venes
-0.14
orie
-0.14
plusplus
-0.14
emachine
-0.14
POSITIVE LOGITS
esson
0.18
time
0.16
aren
0.15
elay
0.15
spent
0.15
rame
0.15
GOODMAN
0.15
spend
0.14
on
0.14
Spend
0.14
Activations Density 0.030%