INDEX
Explanations
phrases or words associated with giving positive feedback or admiration
contexts of admiration and commendation
New Auto-Interp
Negative Logits
arin
-0.75
gamer
-0.72
olen
-0.69
¯¯
-0.68
carry
-0.66
perature
-0.64
Hack
-0.61
Gamer
-0.60
otype
-0.60
NetMessage
-0.60
POSITIVE LOGITS
praise
1.10
praises
0.95
praising
0.92
hovah
0.92
praised
0.84
eous
0.83
ifully
0.80
thanking
0.78
acclaim
0.73
imaru
0.72
Activations Density 0.020%