INDEX
Explanations
phrases related to universal themes or general statements
expressions of collective sentiment or shared experiences among people
New Auto-Interp
Negative Logits
qus
-0.69
edia
-0.65
ahime
-0.64
claw
-0.62
pelling
-0.62
rer
-0.62
vernment
-0.61
eln
-0.59
Advertisement
-0.59
angered
-0.58
POSITIVE LOGITS
except
1.16
imaginable
0.95
Tes
0.92
except
0.92
equally
0.89
alike
0.76
winner
0.72
interchangeable
0.71
conceivable
0.71
ãĤ«
0.65
Activations Density 0.293%