INDEX
Explanations
references to the concept of being newly introduced or initiated into a situation
New Auto-Interp
Negative Logits
Newman
-0.15
antas
-0.14
ÑģÑĤа
-0.14
andal
-0.14
orias
-0.14
manship
-0.13
anse
-0.13
ãĥ³ãĥĢ
-0.13
uder
-0.13
META
-0.13
POSITIVE LOGITS
ãģ°ãģĭãĤĬ
0.20
mitter
0.16
newly
0.16
swire
0.16
arrived
0.15
entr
0.15
нез
0.15
åīĽ
0.15
ertz
0.15
ippi
0.14
Activations Density 0.147%