INDEX
Explanations
references to diaries and diary entries
New Auto-Interp
Negative Logits
Schne
-0.44
Mun
-0.43
masculin
-0.42
Steffen
-0.41
Mun
-0.41
n
-0.41
מן
-0.40
sen
-0.40
Black
-0.39
xC
-0.39
POSITIVE LOGITS
diary
2.16
Diary
2.14
Diary
1.97
diaries
1.76
diary
1.73
Diaries
1.59
日记
0.98
diário
0.94
Diario
0.94
日記
0.90
Activations Density 0.005%