INDEX
Explanations
hyphenated or punctuated phrases that indicate connections between different pieces of information
roles and names
New Auto-Interp
Negative Logits
etc
-0.46
…
-0.43
……
-0.40
……
-0.38
solcher
-0.38
appunto
-0.38
solches
-0.37
other
-0.37
strip
-0.36
yordu
-0.35
POSITIVE LOGITS
FunctionFlags
0.55
otheby
0.54
Former
0.54
Members
0.54
Governor
0.53
Patients
0.52
Patients
0.52
Students
0.52
Secretary
0.52
expandindo
0.52
Activations Density 0.005%