INDEX
Explanations
multiple conjunctions and references to both nouns and claims
New Auto-Interp
Negative Logits
-0.38
E
-0.37
COMPAT
-0.34
Adam
-0.34
load
-0.34
Format
-0.33
Adam
-0.32
稔
-0.32
根
-0.32
一郎
-0.32
POSITIVE LOGITS
!*\
0.91
ſte
0.90
تضيفلها
0.88
itſelf
0.88
########.
0.88
ſch
0.86
Efq
0.85
bezeichneter
0.85
#+#
0.83
ſche
0.83
Activations Density 0.455%