INDEX
Explanations
references to significant actions or decisions and their importance within the context
New Auto-Interp
Negative Logits
actics
-0.17
awan
-0.16
\<^
-0.16
ories
-0.15
geç
-0.14
ãģĹãĤĩ
-0.14
ihat
-0.14
abbo
-0.13
itis
-0.13
Antwort
-0.13
POSITIVE LOGITS
mistake
0.32
mistakes
0.27
distinction
0.26
noises
0.26
acquaintance
0.26
contribution
0.25
decisions
0.25
connection
0.24
adjustments
0.24
decision
0.24
Activations Density 0.115%