INDEX
Explanations
key structural elements and thematic concepts in discussions or narratives
New Auto-Interp
Negative Logits
apa
-0.16
like
-0.16
ifo
-0.15
hea
-0.15
ways
-0.14
isman
-0.14
olog
-0.14
оÑĩкÑĥ
-0.14
anders
-0.14
istr
-0.14
POSITIVE LOGITS
liest
0.20
iest
0.20
choice
0.20
choice
0.20
duy
0.18
Choice
0.17
franca
0.16
ropoda
0.16
_choice
0.16
equivalent
0.16
Activations Density 0.221%