INDEX
Explanations
names starting with Bert or Burt
New Auto-Interp
Negative Logits
Pless
-0.74
braio
-0.72
ours
-0.70
shouting
-0.69
benötigen
-0.69
coherent
-0.69
omer
-0.68
家用
-0.67
screaming
-0.67
françaises
-0.66
POSITIVE LOGITS
ższy
0.81
Olivenöl
0.81
serving
0.81
ALLENG
0.79
Serving
0.79
が上が
0.77
Награды
0.76
konci
0.74
جز
0.73
NIK
0.73
Activations Density 0.016%