INDEX
Explanations
phrases related to reasoning or explanation
instances of the words "this" and "that" along with their variations
New Auto-Interp
Negative Logits
Cho
-0.72
venants
-0.70
lets
-0.68
\\\\\\\\
-0.66
leon
-0.66
mare
-0.64
marks
-0.64
fl
-0.63
ãĤ§
-0.63
ãĥ³ãĤ¸
-0.62
POSITIVE LOGITS
sake
1.67
purposes
1.38
purpose
1.07
reasons
1.02
ummies
1.01
reason
0.89
occasion
0.86
foreseeable
0.85
particular
0.80
upcoming
0.78
Activations Density 0.067%