INDEX
Explanations
phrases indicating newcomers or individuals who are seeking help or guidance
New Auto-Interp
Negative Logits
245
-0.18
Bast
-0.15
Current
-0.15
éϵ
-0.15
gone
-0.15
founding
-0.15
Stokes
-0.14
at
-0.14
Recent
-0.14
Barber
-0.14
POSITIVE LOGITS
icode
0.18
fold
0.17
folds
0.15
unfamiliar
0.15
fleet
0.15
yne
0.15
ushima
0.14
ANCH
0.14
ipay
0.14
agara
0.14
Activations Density 0.029%