INDEX
Explanations
references to individuals and their actions or states
New Auto-Interp
Negative Logits
seamnă
-0.75
bave
-0.53
Jegyzetek
-0.52
Silly
-0.51
activa
-0.47
pett
-0.47
<bos>
-0.47
ift
-0.47
Vou
-0.45
ulent
-0.45
POSITIVE LOGITS
himself
1.94
himself
1.65
his
1.48
Himself
1.44
his
1.33
His
1.33
His
1.21
He
1.09
He
1.08
he
1.07
Activations Density 0.441%