INDEX
Explanations
numerical data or statistics
New Auto-Interp
Negative Logits
expire
-0.72
leisure
-0.65
ossip
-0.65
withdrawn
-0.64
bulletin
-0.64
accus
-0.64
gossip
-0.63
restraint
-0.62
disclosures
-0.62
chat
-0.62
POSITIVE LOGITS
️
0.80
retty
0.79
770
0.76
Unch
0.74
920
0.72
670
0.71
WARE
0.71
530
0.69
440
0.68
Alone
0.68
Activations Density 0.254%