INDEX
Explanations
occurrences of the word "I" and its related forms, indicating a focus on personal experiences or perspectives
New Auto-Interp
Negative Logits
pong
-0.14
wand
-0.14
AppState
-0.14
fans
-0.13
ök
-0.13
lou
-0.13
-us
-0.13
APPER
-0.13
WARDED
-0.13
lk
-0.13
POSITIVE LOGITS
ample
0.17
0.16
Morav
0.15
alike
0.15
ager
0.14
ihil
0.14
HAL
0.14
istr
0.14
SBATCH
0.14
.MouseAdapter
0.13
Activations Density 0.015%