INDEX
Negative Logits
Trou
-0.07
discover
-0.07
'O
-0.06
htt
-0.06
Newton
-0.06
intuition
-0.06
Joy
-0.06
typings
-0.06
Sevent
-0.06
Trou
-0.06
POSITIVE LOGITS
based
0.21
Based
0.19
based
0.16
Based
0.16
-based
0.15
-Based
0.13
biased
0.08
_based
0.08
ựa
0.08
기반
0.08
Activations Density 0.066%