INDEX
Negative Logits
userc
-0.70
Supplemental
-0.62
afety
-0.62
âĹ
-0.61
stub
-0.61
©¶æ
-0.59
NETWORK
-0.59
hower
-0.59
ocument
-0.58
pretext
-0.57
POSITIVE LOGITS
agos
0.92
forth
0.78
oha
0.75
gow
0.73
reath
0.71
rant
0.70
furt
0.68
asks
0.68
ted
0.68
stad
0.67
Activations Density 0.089%