INDEX
Explanations
natural language or sciences
New Auto-Interp
Negative Logits
occurring
0.91
naturales
0.75
occurring
0.75
Mejor
0.74
occuring
0.74
ocorr
0.74
vontade
0.73
naturais
0.72
voglio
0.71
wanting
0.71
POSITIVE LOGITS
surface
0.73
history
0.72
Napoleon
0.71
stupe
0.70
Heritage
0.70
Hazard
0.70
History
0.70
HISTORY
0.69
поверхность
0.68
सतह
0.68
Activations Density 0.008%