INDEX
Explanations
learned about events and stories
New Auto-Interp
Negative Logits
трите
0.47
announc
0.43
foto
0.43
telefon
0.42
طلب
0.41
evin
0.41
wifi
0.41
photos
0.40
distribu
0.40
necess
0.40
POSITIVE LOGITS
Cyan
0.55
Bind
0.51
8
0.49
9
0.47
3
0.46
Car
0.45
Cyan
0.45
6
0.45
কে
0.44
Azure
0.44
Activations Density 0.001%