INDEX
Explanations
correctness and common phrases
New Auto-Interp
Negative Logits
Presiden
0.74
surfaces
0.74
omega
0.71
russ
0.70
President
0.68
owaniu
0.67
хь
0.67
threat
0.65
ishops
0.65
Threats
0.64
POSITIVE LOGITS
rectify
0.94
정확
0.87
download
0.85
correct
0.84
incorrect
0.82
baiki
0.81
Correct
0.81
donate
0.80
inaccurate
0.80
Tjiwarl
0.78
Activations Density 0.002%