INDEX
Explanations
percentage comparisons in text
phrases indicating comparisons or relationships between different items or groups
New Auto-Interp
Negative Logits
gravity
-0.79
rog
-0.77
rak
-0.72
ger
-0.71
Defenders
-0.70
azaar
-0.69
spe
-0.67
talk
-0.67
gery
-0.66
wordpress
-0.66
POSITIVE LOGITS
eleph
0.83
sexes
0.77
sidx
0.77
guiActiveUn
0.77
weep
0.74
proport
0.73
nomine
0.71
halves
0.69
nces
0.69
çͰ
0.67
Activations Density 0.009%