INDEX
Negative Logits
Lug
0.49
Wyn
0.48
al
0.46
Jer
0.46
Butter
0.45
Sil
0.45
an
0.44
Gmail
0.44
Work
0.43
Lug
0.43
POSITIVE LOGITS
vérit
0.50
utors
0.49
médicas
0.48
jurisdict
0.48
dictators
0.48
tokamaks
0.47
systèmes
0.46
یداری
0.46
emphatic
0.46
नियां
0.46
Activations Density 0.001%