INDEX
Explanations
pronoun followed by verb phrase
New Auto-Interp
Negative Logits
violations
0.50
aviation
0.48
unting
0.47
か
0.46
essä
0.45
seepage
0.44
प्रतिकूल
0.44
そば
0.43
ageddon
0.42
씩
0.42
POSITIVE LOGITS
for
0.50
з
0.44
cleanly
0.42
entspre
0.42
ેલ
0.42
compon
0.41
ونم
0.41
Gilroy
0.41
ﻷ
0.41
definitively
0.41
Activations Density 0.002%