INDEX
Explanations
conjunctions and pronouns denoting involvement or responsibility
references to individuals and their actions within a context
New Auto-Interp
Negative Logits
arious
-0.64
jun
-0.63
Family
-0.63
course
-0.61
170
-0.61
detail
-0.59
00000
-0.59
uku
-0.59
requency
-0.58
ene
-0.58
POSITIVE LOGITS
drew
0.98
coined
0.94
mattered
0.93
drove
0.90
decides
0.89
initiated
0.88
oversaw
0.87
prompted
0.87
determines
0.87
sparked
0.87
Activations Density 0.099%