INDEX
Negative Logits
Had
0.43
HAD
0.41
Had
0.40
್
0.37
had
0.37
Appropri
0.36
out
0.36
vissa
0.35
condenser
0.34
vis
0.34
POSITIVE LOGITS
anson
0.42
imore
0.42
Gy
0.40
NSE
0.40
nSamples
0.39
궁금
0.39
Gyro
0.39
몰
0.39
Gunnar
0.38
At
0.38
Activations Density 0.001%