INDEX
Explanations
references to parental figures and family relationships
New Auto-Interp
Negative Logits
itself
-0.91
itself
-0.82
itſelf
-0.78
Itself
-0.76
themselves
-0.76
UnsafeEnabled
-0.72
themſelves
-0.69
kasarigan
-0.67
transfieras
-0.64
évaluateur
-0.64
POSITIVE LOGITS
who
0.67
whom
0.60
’
0.56
Personendaten
0.55
AccessorTable
0.54
biologique
0.54
atial
0.53
'
0.51
Välislingid
0.50
0.49
Activations Density 0.181%