INDEX
Explanations
pronouns and their associated verb forms
New Auto-Interp
Negative Logits
xxx
-0.38
EXTERN
-0.34
knowing
-0.33
eck
-0.32
choose
-0.32
rg
-0.32
thofe
-0.31
actual
-0.31
choice
-0.31
choosing
-0.31
POSITIVE LOGITS
complexContent
0.70
المناصب
0.57
majánló
0.54
ſſo
0.52
ujednoznacz
0.52
nahilalakip
0.51
tagext
0.51
påver
0.50
exists
0.50
ſſi
0.50
Activations Density 0.424%