INDEX
Explanations
mentions of specific values or ratings
instances of a specific character or symbol
New Auto-Interp
Negative Logits
raints
-0.84
Seym
-0.83
mathemat
-0.75
disadvant
-0.74
misunder
-0.74
trainers
-0.72
Enlightenment
-0.70
condem
-0.70
pestic
-0.69
enegger
-0.69
POSITIVE LOGITS
ï¸ı
1.29
lean
1.00
log
0.91
ï¸
0.86
ĺ
0.86
ģ
0.82
£
0.82
rd
0.81
deg
0.81
Ģ
0.80
Activations Density 0.034%