INDEX
Explanations
individual marks or indicators within a context
references to scoring or evaluation metrics
New Auto-Interp
Negative Logits
ILLE
-0.93
abama
-0.71
anooga
-0.69
ettel
-0.68
agan
-0.68
rouch
-0.66
imation
-0.66
odka
-0.65
Kut
-0.64
brut
-0.61
POSITIVE LOGITS
manship
1.50
downs
1.07
down
1.03
eters
0.97
Twain
0.95
emark
0.93
ups
0.88
posts
0.87
lights
0.84
eting
0.83
Activations Density 0.030%