INDEX
Explanations
references to people or groups, particularly those involved in actions or judgments
New Auto-Interp
Negative Logits
ston
-0.15
aran
-0.14
.Include
-0.14
arkan
-0.14
nutshell
-0.14
ichern
-0.14
itself
-0.13
estone
-0.13
himself
-0.13
xit
-0.13
POSITIVE LOGITS
otherwise
0.20
matters
0.19
otherwise
0.17
mattered
0.17
might
0.16
menstr
0.16
CKER
0.16
matter
0.16
Matters
0.16
Otherwise
0.16
Activations Density 0.181%