INDEX
Explanations
proper nouns, specifically names of individuals
references to individuals, particularly their actions or attributes
New Auto-Interp
Negative Logits
marginally
-0.60
EngineDebug
-0.59
(?,
-0.58
."
-0.57
().
-0.55
moderately
-0.53
âĶľ
-0.53
",
-0.52
horm
-0.51
®,
-0.50
POSITIVE LOGITS
]
2.59
],"
2.52
]"
2.49
]."
2.41
]'
2.34
],
2.19
']
2.14
]-
2.13
].
2.08
?]
1.89
Activations Density 0.126%