INDEX
Explanations
start of actions or processes
New Auto-Interp
Negative Logits
когато
0.59
ומ
0.57
ግሎ
0.56
veloc
0.55
commerciales
0.54
второго
0.53
tls
0.53
һәм
0.52
және
0.52
uvijek
0.52
POSITIVE LOGITS
increasingly
0.50
passing
0.47
nuanced
0.46
deliberate
0.45
appealing
0.45
refined
0.44
qualitatively
0.44
extensively
0.43
a
0.42
stumbling
0.42
Activations Density 0.024%