INDEX
Explanations
phrases related to levels, measurements, and assessments of various phenomena
New Auto-Interp
Negative Logits
ertain
-0.59
agog
-0.58
rums
-0.55
ãĥ£
-0.54
agen
-0.53
ãĥ©ãĥ³
-0.52
azeera
-0.50
spoon
-0.50
bert
-0.50
ead
-0.49
POSITIVE LOGITS
destro
0.81
Azerb
0.76
rul
0.74
withd
0.73
confir
0.69
apiece
0.68
explan
0.66
altogether
0.64
redes
0.63
proport
0.63
Activations Density 0.413%