INDEX
Explanations
phrases indicating purpose or intention
New Auto-Interp
Negative Logits
ing
-0.76
ostante
-0.71
polymorph
-0.70
avoient
-0.70
Gorg
-0.69
Tikang
-0.68
guenos
-0.68
Brutus
-0.68
culoare
-0.66
voorbeeld
-0.65
POSITIVE LOGITS
nel
1.05
untuk
0.99
Untuk
0.78
nell
0.78
ในการ
0.77
Nel
0.75
để
0.74
برای
0.73
לה
0.72
Để
0.70
Activations Density 0.028%