INDEX
Explanations
themes of deception and manipulation in narratives
New Auto-Interp
Negative Logits
odia
-0.17
engu
-0.17
asser
-0.16
itez
-0.15
TextLabel
-0.15
à¹Ģà¸ķà¸Ńร
-0.14
Taken
-0.14
llx
-0.14
uba
-0.13
ìĿį
-0.13
POSITIVE LOGITS
agem
0.18
arov
0.17
ibur
0.16
behind
0.16
preco
0.15
ár
0.15
ourt
0.15
igo
0.15
plan
0.14
æİ§
0.14
Activations Density 0.328%