INDEX
Explanations
defining or explaining phrases
New Auto-Interp
Negative Logits
unele
0.54
aufgrund
0.50
cosa
0.48
Niall
0.46
Gunners
0.46
Austria
0.46
Dublin
0.46
trotz
0.46
androidx
0.45
Blanch
0.45
POSITIVE LOGITS
gena
0.42
le
0.40
raj
0.39
작
0.38
ikal
0.38
ling
0.38
water
0.37
ok
0.37
Ģ
0.37
покры
0.37
Activations Density 0.003%