INDEX
Explanations
prepositions indicating location or placement in relation to objects
New Auto-Interp
Negative Logits
edited
-0.64
digest
-0.62
aired
-0.60
period
-0.59
hy
-0.57
abbrevi
-0.56
staggered
-0.55
oreal
-0.55
utenberg
-0.55
karma
-0.55
POSITIVE LOGITS
DonaldTrump
0.82
erous
0.80
tops
0.72
ibaba
0.70
sers
0.70
slaught
0.70
lie
0.69
yx
0.69
btn
0.69
sie
0.66
Activations Density 0.115%