INDEX
Explanations
requests for assistance or information on technical topics
Text with informal abbreviations and punctuation
asking for help or questions
New Auto-Interp
Negative Logits
;-)
-0.56
})));
-0.55
‐
-0.51
:-)
-0.51
}
-0.49
denomina
-0.48
…..
-0.47
omiast
-0.47
%}
-0.47
#}
-0.46
POSITIVE LOGITS
goddamn
1.20
idk
1.17
lmao
1.01
fuckin
1.00
idk
1.00
fucking
0.99
Idk
0.99
FUCKING
0.97
iirc
0.94
tbh
0.93
Activations Density 0.824%