INDEX
Explanations
phrases referring to actions or characteristics related to people and groups
pronouns that refer to people or groups
New Auto-Interp
Negative Logits
Darling
-0.66
mi
-0.60
kamp
-0.59
zu
-0.58
Quantity
-0.58
erenn
-0.58
mini
-0.57
aleb
-0.56
oji
-0.56
dim
-0.56
POSITIVE LOGITS
pires
0.79
accompanies
0.77
arose
0.68
fter
0.66
comprise
0.65
comprises
0.65
constitutes
0.65
hesda
0.64
rocked
0.63
violates
0.63
Activations Density 0.259%