INDEX
Explanations
the presence of the word "with" and its various contexts
New Auto-Interp
Negative Logits
thin
-0.84
sburg
-0.80
station
-0.72
Awakens
-0.68
fair
-0.68
deen
-0.66
book
-0.66
hops
-0.66
TG
-0.65
fuck
-0.64
POSITIVE LOGITS
regard
1.11
drawn
1.09
stood
1.07
regards
0.96
draw
0.94
assistance
0.93
respect
0.86
hopes
0.85
impunity
0.85
apologies
0.84
Activations Density 0.117%