INDEX
Explanations
key indicators of events or actions within a narrative context
New Auto-Interp
Negative Logits
acha
-0.16
consts
-0.16
ysl
-0.15
wyn
-0.15
uhe
-0.14
edin
-0.14
448
-0.14
beim
-0.14
155
-0.14
auen
-0.14
POSITIVE LOGITS
ay
0.23
Ãł
0.20
á
0.17
tat
0.16
al
0.16
t
0.16
atty
0.16
Âłt
0.16
at
0.15
Âł
0.15
Activations Density 0.171%