INDEX
Explanations
phrases indicating the availability and presentation of information
New Auto-Interp
Negative Logits
Identified
-0.15
Allowed
-0.15
ott
-0.14
æ¶Ī
-0.14
_ALLOWED
-0.14
доп
-0.14
ga
-0.14
540
-0.14
ATEST
-0.13
atest
-0.13
POSITIVE LOGITS
available
0.20
available
0.19
below
0.18
provided
0.17
posted
0.17
contained
0.16
furnished
0.16
ushman
0.16
iska
0.16
appa
0.15
Activations Density 0.085%