INDEX
Explanations
references to identity and its variations within texts
New Auto-Interp
Negative Logits
Geſ
-0.63
Monfieur
-0.63
Eſ
-0.52
pleaſure
-0.51
gethan
-0.50
newData
-0.49
ſou
-0.48
Füße
-0.48
paſſ
-0.47
financière
-0.46
POSITIVE LOGITS
identity
1.05
Identity
0.93
identity
0.93
Identity
0.90
identities
0.87
IDENTITY
0.85
Identities
0.76
IDENTITY
0.69
got
0.66
clone
0.66
Activations Density 0.436%