INDEX
Explanations
instances of characters taking a stand or defending their beliefs
New Auto-Interp
Negative Logits
separately
-0.15
onth
-0.15
ÏĦαν
-0.14
iw
-0.14
ildo
-0.14
vek
-0.14
CTX
-0.14
separate
-0.14
ullo
-0.14
onom
-0.13
POSITIVE LOGITS
against
0.17
against
0.15
ilos
0.15
ammer
0.15
пози
0.15
erez
0.15
sahip
0.15
<!--[
0.14
ç«Ļ
0.14
statements
0.14
Activations Density 0.033%