INDEX
Explanations
names of political or public figures
isolated single letters or characters, particularly at the beginning of words
New Auto-Interp
Negative Logits
artifacts
-0.68
hett
-0.63
CoC
-0.63
ashtra
-0.62
ELF
-0.61
channelAvailability
-0.61
TextColor
-0.59
lasses
-0.58
Measure
-0.58
AUD
-0.58
POSITIVE LOGITS
uala
0.88
Alvarez
0.85
Camer
0.80
icz
0.75
wu
0.74
uma
0.72
oglu
0.70
ulum
0.70
Philippe
0.68
Tec
0.65
Activations Density 0.220%