INDEX
Explanations
phrases related to concepts or entities being disputed or labeled with quotation marks
references to terms that are considered "so-called" or labeled in a specific context
New Auto-Interp
Negative Logits
Dickinson
-0.83
ibrary
-0.80
Rivals
-0.79
Rica
-0.76
ANGEL
-0.75
DRAG
-0.71
ADRA
-0.71
Larson
-0.70
Sparrow
-0.69
reservation
-0.69
POSITIVE LOGITS
called
1.09
sized
1.03
historic
0.96
shaped
0.94
sounding
0.94
oriented
0.91
named
0.90
focused
0.90
eared
0.89
equipped
0.89
Activations Density 0.022%