INDEX
Explanations
occurrences of the word "in"
New Auto-Interp
Negative Logits
sofar
-0.21
relation
-0.20
relation
-0.19
ved
-0.19
reo
-0.18
agar
-0.17
Relation
-0.17
duct
-0.17
regards
-0.16
inder
-0.16
POSITIVE LOGITS
truth
0.27
typical
0.21
true
0.19
truth
0.18
reality
0.18
istrovstvÃŃ
0.16
characteristic
0.15
ä¸Ģ页
0.15
spirit
0.15
_truth
0.15
Activations Density 0.102%