INDEX
Explanations
references to specific entities or names
the special character 'ĺ' in various contexts
New Auto-Interp
Negative Logits
Seym
-0.84
disadvant
-0.75
condem
-0.74
Enlightenment
-0.71
pestic
-0.70
raints
-0.70
explan
-0.70
trainers
-0.68
mathemat
-0.67
welf
-0.67
POSITIVE LOGITS
ï¸ı
1.32
lean
1.01
log
0.92
ģ
0.91
ĺ
0.91
ï¸
0.89
Ģ
0.88
leans
0.85
ļ
0.85
ł
0.82
Activations Density 0.034%