INDEX
Explanations
the word "most" indicating significant prevalence or importance
New Auto-Interp
Negative Logits
uce
-0.18
IPP
-0.17
odb
-0.16
δÏĮν
-0.15
imple
-0.15
Sil
-0.15
inth
-0.14
ount
-0.14
prus
-0.14
iation
-0.14
POSITIVE LOGITS
likely
0.25
importantly
0.25
/all
0.24
likely
0.23
acci
0.23
Likely
0.20
errat
0.19
afa
0.19
efa
0.19
eller
0.18
Activations Density 0.057%