INDEX
Explanations
phrases indicating future actions or possibilities
New Auto-Interp
Negative Logits
discovering
-0.94
Discovering
-0.78
Finding
-0.76
Discovering
-0.75
discovered
-0.74
locating
-0.73
Discovered
-0.72
discovery
-0.72
Finding
-0.71
Searching
-0.66
POSITIVE LOGITS
fin
0.75
fins
0.72
fund
0.70
fines
0.69
fond
0.69
fiind
0.65
FIN
0.63
fine
0.62
fina
0.61
fined
0.59
Activations Density 0.336%