INDEX
Explanations
references to identity and transformation concepts
New Auto-Interp
Negative Logits
bias
-0.55
treat
-0.53
Bias
-0.51
quo
-0.50
iprot
-0.49
PLEMENT
-0.49
fédéral
-0.48
èvement
-0.48
]})
-0.48
zeciw
-0.48
POSITIVE LOGITS
cloned
0.67
identity
0.66
imposter
0.65
cloned
0.64
Identität
0.64
impostor
0.62
Identity
0.61
TextAppearance
0.59
identidad
0.58
Identity
0.58
Activations Density 0.436%