INDEX
Explanations
references to the word "man"
New Auto-Interp
Negative Logits
ted
-0.18
tin
-0.17
ga
-0.17
genic
-0.17
gate
-0.17
onaut
-0.17
iesen
-0.17
gen
-0.16
go
-0.16
gie
-0.15
POSITIVE LOGITS
iac
0.31
hattan
0.28
agements
0.25
alysis
0.24
agers
0.23
ifold
0.23
UEL
0.23
ufact
0.23
agment
0.22
ifest
0.22
Activations Density 0.062%