INDEX
Explanations
titles followed by names speaking
New Auto-Interp
Negative Logits
Always
0.40
чно
0.36
ridurre
0.35
Checked
0.34
過
0.34
destroyed
0.34
Chunks
0.33
Turns
0.33
过度
0.33
decimated
0.33
POSITIVE LOGITS
explained
0.47
aforesaid
0.45
said
0.44
son
0.43
emphasised
0.43
selaku
0.41
ian
0.40
said
0.40
বলেন
0.40
elaborated
0.39
Activations Density 0.001%