INDEX
Explanations
discussions about political arguments and hypocrisy
New Auto-Interp
Negative Logits
ç¥Ŀ
-0.13
alat
-0.13
crud
-0.12
.secret
-0.12
zeich
-0.12
clearfix
-0.12
erule
-0.12
lesai
-0.12
ÑĢиÑĩ
-0.12
883
-0.11
POSITIVE LOGITS
argument
0.75
arguments
0.75
Argument
0.65
argument
0.65
arguments
0.63
Arguments
0.63
Argument
0.61
arg
0.59
Arguments
0.57
argue
0.56
Activations Density 1.009%