INDEX
Explanations
the presence and variety of existential phrases and quantifiers
New Auto-Interp
Negative Logits
ריכה
-0.56
newline
-0.49
oluzione
-0.48
flink
-0.48
cause
-0.47
dis
-0.47
'
-0.46
SPJ
-0.46
"
-0.46
alnya
-0.45
POSITIVE LOGITS
AndEndTag
0.89
tranſ
0.76
îna
0.75
Charlemagne
0.75
незавершена
0.73
cammino
0.71
centrality
0.69
تضيفلها
0.69
poffible
0.69
ſta
0.69
Activations Density 0.041%