INDEX
Explanations
un- prefix followed by nouns
New Auto-Interp
Negative Logits
ویت
1.82
Suy
1.77
Sarı
1.75
osit
1.74
bbe
1.72
ไต
1.72
attano
1.72
omponent
1.71
sams
1.68
ری
1.67
POSITIVE LOGITS
ंचल
1.77
ITIES
1.61
悱
1.52
ဘူး
1.47
काशी
1.46
Realms
1.46
dares
1.44
circum
1.42
पणे
1.41
ski
1.41
Activations Density 0.518%