INDEX
Explanations
personal descriptors or characteristics
references to personal experiences or opinions
New Auto-Interp
Negative Logits
xual
-1.13
LER
-0.79
ï¸
-0.77
GAN
-0.72
GGGG
-0.72
ktop
-0.72
etting
-0.70
XM
-0.70
Faster
-0.69
Tens
-0.69
POSITIVE LOGITS
ised
1.18
ized
1.03
ities
0.99
ization
0.95
belongings
0.94
isations
0.93
isation
0.90
pronouns
0.89
hygiene
0.87
trainer
0.86
Activations Density 0.016%