INDEX
Explanations
names, places, and abstract concepts
New Auto-Interp
Negative Logits
y
0.77
ة
0.74
ed
0.72
0
0.68
0.64
0.62
&
0.62
(
0.61
a
0.59
is
0.58
POSITIVE LOGITS
pissed
0.96
disagreeable
0.83
ulé
0.80
光滑
0.79
alarmed
0.79
Букмекерлар
0.79
españa
0.79
扑
0.79
ಲ್
0.78
铑
0.78
Activations Density 0.000%