INDEX
Explanations
instances of praise or compliments
New Auto-Interp
Negative Logits
Accepted
-0.15
окон
-0.14
avern
-0.14
ina
-0.13
Warn
-0.13
Symbols
-0.13
.ant
-0.13
accepted
-0.13
ellig
-0.13
ARGS
-0.13
POSITIVE LOGITS
compliment
0.47
praise
0.47
complement
0.41
compliments
0.40
praises
0.39
comple
0.34
praising
0.34
complimentary
0.33
praised
0.30
è¤
0.27
Activations Density 0.289%