INDEX
Explanations
references to specific types of personalities or social roles, particularly masculine traits
New Auto-Interp
Negative Logits
dfx
-0.82
Leone
-0.82
Antiqu
-0.80
Ley
-0.79
Rack
-0.78
Jagu
-0.77
Aid
-0.76
Wand
-0.76
stall
-0.75
osuke
-0.74
POSITIVE LOGITS
fide
0.87
blocker
0.84
particle
0.81
discharge
0.79
delta
0.79
miner
0.77
decay
0.77
fractions
0.75
parity
0.74
concentration
0.73
Activations Density 0.007%