INDEX
Explanations
proper nouns
mentions of the name "Ban" or variations of it
New Auto-Interp
Negative Logits
SERVICE
-0.76
ctl
-0.74
æ©Ł
-0.66
Generations
-0.65
Democr
-0.65
PROG
-0.64
IMAGES
-0.63
OTAL
-0.63
paio
-0.61
IRE
-0.61
POSITIVE LOGITS
anas
1.28
quet
1.15
anan
1.06
ished
0.99
ner
0.98
nered
0.97
jo
0.95
ning
0.93
nery
0.93
aji
0.92
Activations Density 0.015%