INDEX
Explanations
dialogue with character names
New Auto-Interp
Negative Logits
娼
0.48
adultery
0.48
Pavlov
0.45
печа
0.45
nuns
0.45
Kwiat
0.45
policewomen
0.43
媾
0.43
Bapak
0.43
женщи
0.42
POSITIVE LOGITS
Erie
0.49
athlet
0.45
টাইগার
0.45
sporting
0.45
Roswell
0.44
伥
0.44
ج
0.44
Chromebook
0.43
collectively
0.43
laser
0.43
Activations Density 0.004%