INDEX
Explanations
phrases expressing certainty or assurance
confident assertions or affirmations
New Auto-Interp
Negative Logits
natureconservancy
-0.76
vernment
-0.68
âĵĺ
-0.67
çīĪ
-0.66
mercial
-0.66
士
-0.64
entrusted
-0.62
çͰ
-0.61
EGIN
-0.60
perse
-0.60
POSITIVE LOGITS
ndra
0.83
ples
0.72
ties
0.70
zon
0.69
arat
0.67
fire
0.67
tack
0.67
itri
0.66
enough
0.66
rays
0.66
Activations Density 0.015%