INDEX
Explanations
references to award events
references to social events or gatherings, particularly those related to holidays or celebrations
New Auto-Interp
Negative Logits
raiſ
-0.89
poffible
-0.87
myſelf
-0.83
himſelf
-0.82
chofe
-0.82
deſt
-0.81
fubject
-0.78
poffe
-0.78
themſelves
-0.78
avoient
-0.77
POSITIVE LOGITS
s
1.17
s
0.86
0.81
".
0.81
"])
0.79
iastes
0.77
`,
0.76
%',
0.74
’)
0.74
́s
0.73
Activations Density 0.122%