INDEX
Explanations
prepositions followed by entities/places/actions
New Auto-Interp
Negative Logits
нельзя
0.66
placebo
0.59
sebagainya
0.58
cenderung
0.57
де
0.57
লোকজন
0.56
geen
0.55
δεν
0.54
спраши
0.54
っぽい
0.54
POSITIVE LOGITS
aceste
0.85
aquest
0.83
aquest
0.80
această
0.79
هذه
0.78
acest
0.76
هذا
0.73
Acest
0.73
цієї
0.73
这位
0.70
Activations Density 0.000%