INDEX
Explanations
statements indicating metrics or measurements of significance
New Auto-Interp
Negative Logits
Drawn
-0.83
vulner
-0.76
Antar
-0.74
Mobil
-0.68
sacrific
-0.66
Belfast
-0.66
Gardens
-0.64
Agric
-0.63
retirees
-0.61
iffs
-0.61
POSITIVE LOGITS
ï¸ı
1.01
same
0.95
fter
0.91
href
0.90
ski
0.90
Pg
0.86
shall
0.86
felt
0.85
mir
0.82
own
0.80
Activations Density 0.052%