INDEX
Explanations
mentions of branding or labeling
occurrences of the word "bl" in various contexts
New Auto-Interp
Negative Logits
Gund
-0.76
HER
-0.72
reon
-0.68
LER
-0.66
Simulator
-0.66
Democr
-0.64
ORGE
-0.62
roit
-0.61
aeda
-0.61
Engel
-0.61
POSITIVE LOGITS
anca
1.17
ossom
1.14
anco
1.09
umenthal
1.08
anche
1.07
adder
1.07
estone
0.99
adders
0.98
acks
0.97
anches
0.95
Activations Density 0.022%