INDEX
Explanations
references to children and their activities
New Auto-Interp
Negative Logits
eer
-0.19
e
-0.18
uido
-0.17
ваннÑı
-0.16
eed
-0.16
etak
-0.16
eil
-0.15
ezi
-0.15
onica
-0.15
endas
-0.14
POSITIVE LOGITS
nap
0.29
ults
0.22
ages
0.21
friendly
0.21
/people
0.20
/ad
0.20
aged
0.19
-friendly
0.19
errick
0.18
friendly
0.18
Activations Density 0.020%