INDEX
Explanations
phrases indicating a range or interval of values
New Auto-Interp
Negative Logits
OGR
-0.82
rav
-0.81
gow
-0.75
jew
-0.72
awaru
-0.68
insula
-0.68
justice
-0.68
pot
-0.68
eto
-0.68
rats
-0.66
POSITIVE LOGITS
halves
0.91
sexes
0.84
genders
0.80
January
0.80
1910
0.77
1932
0.75
1945
0.74
midnight
0.74
1975
0.74
1981
0.74
Activations Density 0.038%