INDEX
Explanations
references to things being released or put out into the world
phrases that include the word "out."
New Auto-Interp
Negative Logits
cious
-0.73
avorite
-0.70
iosity
-0.68
jriwal
-0.68
interstitial
-0.60
VK
-0.59
xit
-0.58
ugh
-0.57
turnover
-0.57
antry
-0.56
POSITIVE LOGITS
fitted
1.01
lier
0.92
flows
0.92
smart
0.85
stretched
0.84
posts
0.80
lived
0.80
lander
0.79
doors
0.75
fitting
0.75
Activations Density 0.077%