INDEX
Explanations
words related to "understanding" and "knowledge"
the occurrence of the substring "und"
New Auto-Interp
Negative Logits
ãĥīãĥ©ãĤ´ãĥ³
-0.96
======
-0.83
ãĥīãĥ©
-0.79
Ò
-0.77
utic
-0.76
nect
-0.75
toget
-0.69
GY
-0.68
ãĤ·ãĥ£
-0.65
APE
-0.64
POSITIVE LOGITS
erest
1.01
efined
0.99
ecided
0.90
oubtedly
0.89
lings
0.89
igan
0.88
rawn
0.82
oing
0.81
ocumented
0.80
ling
0.77
Activations Density 0.016%