INDEX
Explanations
references to specific individuals and their contributions in a given context
New Auto-Interp
Negative Logits
–
-1.34
…..
-1.20
……
-1.07
….
-1.07
…
-0.97
…)
-0.93
…..
-0.92
…"
-0.91
…….
-0.91
[…]
-0.90
POSITIVE LOGITS
''
2.67
,''
2.44
''
2.40
?''
2.39
.''
2.36
''.
2.10
'',
2.02
'')
2.01
'''
1.97
‘‘
1.97
Activations Density 0.736%