INDEX
Explanations
words related to annoyance and irritation, particularly within a social context like guilds or clans in online communities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
964
+0.10
0.3%
1235
+0.08
0.2%
2030
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
2030
+0.10
0.03
1953
+0.08
0.04
963
+0.08
0.04
Negative Logits
lmfao
-0.71
Hahahahaha
-0.67
🤣🤣
-0.66
bascul
-0.61
Hahah
-0.60
;-;
-0.60
hahah
-0.59
Lmao
-0.57
Ehh
-0.56
quelquefois
-0.55
POSITIVE LOGITS
annoying
0.87
irritating
0.69
incessant
0.66
incess
0.61
annoyance
0.61
pimiento
0.60
endless
0.59
constant
0.59
unbearable
0.58
repetitive
0.57
Activations Density 0.877%