INDEX
Negative Logits
reasons
0.87
Reasons
0.84
Reason
0.84
Reason
0.83
Reasons
0.81
reason
0.78
razones
0.73
理由
0.73
కారణ
0.71
理由は
0.70
POSITIVE LOGITS
reng
0.42
温
0.41
溫
0.40
seller
0.39
gek
0.39
warm
0.38
involved
0.36
hist
0.36
aval
0.36
member
0.35
Activations Density 0.000%