INDEX
Explanations
instances of self-awareness and realization
New Auto-Interp
Negative Logits
LabelTagHelper
-0.83
opoly
-0.61
veck
-0.60
bross
-0.60
Buchanan
-0.58
aguja
-0.57
édrale
-0.56
vej
-0.55
heated
-0.55
medes
-0.55
POSITIVE LOGITS
realize
1.71
realized
1.63
realise
1.61
realizes
1.60
realization
1.59
realises
1.52
realised
1.51
realizing
1.50
realisation
1.44
realising
1.38
Activations Density 0.049%