INDEX
Explanations
references to 'paragraphs' or similar structural elements within texts
New Auto-Interp
Negative Logits
emas
-0.19
rego
-0.18
estate
-0.16
ene
-0.15
egal
-0.15
enthal
-0.15
braco
-0.15
rif
-0.15
acia
-0.15
arseille
-0.15
POSITIVE LOGITS
Par
0.25
par
0.25
excellence
0.23
-par
0.23
adox
0.22
(par
0.22
abolic
0.21
allax
0.21
liament
0.21
aguay
0.20
Activations Density 0.020%