INDEX
Explanations
quotations in text
quoted speech or statements
New Auto-Interp
Negative Logits
shit
-0.91
crap
-0.85
guy
-0.79
stuff
-0.79
bean
-0.77
asshole
-0.76
creep
-0.74
shake
-0.73
gonna
-0.73
jug
-0.72
POSITIVE LOGITS
Our
1.47
Such
1.44
While
1.43
Although
1.42
Therefore
1.39
These
1.38
Despite
1.38
Given
1.37
We
1.37
However
1.36
Activations Density 0.142%