INDEX
Explanations
first-person singular verbs in negative form
expressions of surprise or disbelief
New Auto-Interp
Negative Logits
neighb
-0.61
lishes
-0.57
ilaterally
-0.55
agric
-0.53
ufact
-0.51
municip
-0.51
ilst
-0.50
iann
-0.50
result
-0.50
osponsors
-0.49
POSITIVE LOGITS
fuckin
0.86
fucking
0.78
gonna
0.75
funny
0.68
goddamn
0.66
uh
0.65
alright
0.64
kinda
0.64
fucked
0.62
stupid
0.62
Activations Density 1.235%