INDEX
Negative Logits
AQP
0.81
Circuit
0.81
freely
0.75
authorized
0.75
Leah
0.73
AAP
0.73
Authorized
0.71
David
0.70
Mechanics
0.70
wyróż
0.69
POSITIVE LOGITS
whatever
1.04
coworkers
1.01
whatever
0.96
coworker
0.89
...]
0.89
Whatever
0.86
roommates
0.84
roommate
0.80
argumentative
0.80
เพื่อน
0.79
Activations Density 0.000%