INDEX
Explanations
phrases that express the concept of novelty or transformation
New Auto-Interp
Negative Logits
recent
-0.15
lenÃŃ
-0.14
Latest
-0.14
remen
-0.14
recent
-0.13
güncel
-0.13
azı
-0.13
Äįer
-0.13
Ñĥда
-0.13
§
-0.13
POSITIVE LOGITS
whole
0.78
whole
0.68
entirely
0.61
Whole
0.60
Whole
0.60
altogether
0.56
entire
0.54
Entire
0.49
completely
0.44
caÅĤ
0.39
Activations Density 0.163%