INDEX
Explanations
how things are presented or operate
New Auto-Interp
Negative Logits
Become
0.42
превра
0.40
ഉണ്ടാ
0.40
menjadi
0.39
дворе
0.39
achievements
0.38
becomes
0.38
Different
0.38
Different
0.37
变为
0.37
POSITIVE LOGITS
behaved
0.57
worded
0.55
configured
0.53
看待
0.51
presented
0.50
orientated
0.49
behaving
0.49
behaved
0.48
behave
0.48
configured
0.48
Activations Density 0.012%