INDEX
Explanations
instances of specific phrases or concepts related to influence and personal relationships
New Auto-Interp
Negative Logits
civil
-0.15
ibe
-0.15
ÑĪки
-0.15
boil
-0.14
erk
-0.14
ÑĪка
-0.14
blame
-0.14
OK
-0.14
kos
-0.14
ts
-0.14
POSITIVE LOGITS
endoza
0.17
Sesso
0.16
UAGE
0.15
rana
0.14
ÙĬÙĥا
0.14
¼åIJĪ
0.14
lied
0.14
ãĤ¯ãĤ»
0.14
ourn
0.13
awe
0.13
Activations Density 0.001%