INDEX
Explanations
questions within text
the word "What" indicating questions or inquiries
New Auto-Interp
Negative Logits
shore
-0.65
heter
-0.63
ulic
-0.62
fish
-0.61
println
-0.59
Lago
-0.58
lich
-0.58
ped
-0.57
POR
-0.57
atory
-0.57
POSITIVE LOGITS
soever
1.39
happens
1.16
happened
1.04
happ
1.00
distinguishes
0.91
transpired
0.89
exactly
0.88
kinds
0.87
constitutes
0.86
else
0.86
Activations Density 0.082%