INDEX
Explanations
special characters or punctuation marks in the text
New Auto-Interp
Negative Logits
ment
-0.70
ous
-0.63
ше
-0.63
isson
-0.63
ness
-0.62
an
-0.62
ligen
-0.61
Montal
-0.59
ism
-0.59
McCar
-0.58
POSITIVE LOGITS
"}
1.77
'}
1.68
"]}
1.64
']}
1.62
]")]
1.56
]}
1.55
)}
1.55
")}
1.52
"}
1.50
})}
1.47
Activations Density 0.346%