INDEX
Explanations
phrases related to specific entities or groups
phrases denoting possession or belonging
New Auto-Interp
Negative Logits
Pg
-0.73
obyl
-0.69
](
-0.65
displayText
-0.65
.''.
-0.64
={-0.63
violates
-0.63
foundation
-0.62
concede
-0.62
mere
-0.61
POSITIVE LOGITS
choice
1.12
course
0.97
yes
0.97
sorts
0.94
tomorrow
0.94
icial
0.90
antiquity
0.88
Choice
0.87
Tomorrow
0.87
WWII
0.85
Activations Density 0.162%