INDEX
Explanations
references to the term "Mans" in various contexts
New Auto-Interp
Negative Logits
ecycle
-0.19
echan
-0.18
ocene
-0.17
pty
-0.17
ADATA
-0.16
venes
-0.16
echn
-0.16
tok
-0.15
ivor
-0.15
ostel
-0.15
POSITIVE LOGITS
laughter
0.33
field
0.31
ions
0.31
ouri
0.27
uet
0.25
oor
0.25
our
0.23
arov
0.22
ory
0.22
IONS
0.22
Activations Density 0.011%