INDEX
Explanations
phrases related to comparison or evaluation
references to relationships or comparisons in terms of various subjects
New Auto-Interp
Negative Logits
burgh
-0.79
iologist
-0.71
falls
-0.69
pots
-0.68
redited
-0.67
Card
-0.66
enter
-0.64
stars
-0.64
iard
-0.63
izens
-0.63
POSITIVE LOGITS
sheer
1.03
chronological
0.83
geography
0.76
pure
0.75
legality
0.75
geographical
0.73
fairness
0.73
keeping
0.72
proximity
0.72
improving
0.72
Activations Density 0.045%