INDEX
Explanations
declarations of official actions or decisions
New Auto-Interp
Negative Logits
heit
-0.74
boro
-0.68
favorably
-0.66
utilizing
-0.65
ĪĴ
-0.63
reetings
-0.61
kees
-0.60
hydra
-0.59
Versus
-0.59
hemoth
-0.58
POSITIVE LOGITS
Scroll
0.96
Labour
0.89
Shape
0.77
However
0.77
Inqu
0.74
poll
0.73
Asked
0.71
Speaking
0.70
Writing
0.70
Labour
0.68
Activations Density 0.219%