INDEX
Explanations
questions or unknown information
the word "what" and its repeated usage in various contexts
New Auto-Interp
Negative Logits
robe
-0.71
enburg
-0.69
aches
-0.69
ulic
-0.68
ubs
-0.67
por
-0.64
trop
-0.63
cean
-0.63
ped
-0.63
gar
-0.63
POSITIVE LOGITS
soever
1.16
happened
1.13
happens
1.11
sorts
1.10
happ
1.05
kinds
1.03
transpired
0.93
else
0.86
exactly
0.85
constitutes
0.84
Activations Density 0.118%