INDEX
Explanations
actions followed by prepositions
New Auto-Interp
Negative Logits
izing
0.68
serían
0.62
amis
0.60
isation
0.57
ising
0.57
ization
0.56
ial
0.56
isasi
0.54
ﺍﻟ
0.54
ality
0.53
POSITIVE LOGITS
ר
0.81
ט
0.74
ד
0.72
ת
0.71
ז
0.71
ס
0.70
ле
0.69
Ш
0.68
ה
0.62
ש
0.61
Activations Density 0.930%